11-811: Interdisciplinary NLP: Language Modeling in the Wild

Course page for 11-811

Fall 2026  ·  Tuesdays & Thursdays, 12:30–1:50pm  ·  Wean Hall 6403

Instructors: Emma Strubell and Clara Na


Overview

Recent advances in natural language processing (NLP), primarily powered by large language models (LLMs) show great potential for enabling advanced analysis of unstructured and semi-structured documents across a diverse array of applications — from accelerating scientific discovery by automatically analyzing materials science research literature, to facilitating a study of the evolution of narrative arcs in 20th century literature.

Historically, successful real world deployment has often required deliberate adaptation: careful definition of the task, curation of new or existing datasets, experimentation to identify strengths and limitations of existing off-the-shelf affordances, and/or consideration of computational and financial feasibility. On the other hand, recent developments in language technologies have included both 1) meaningful capability improvements in many settings that until recently were outside the scope of existing tools, and 2) lowered barriers to use and adaptation of language technologies.

In this class, students with concentrations outside of NLP (e.g. degree programs in materials science, English, …) and students with concentrations in or near NLP (LTI, MLD or equivalent expertise) will work with and learn from each other, to characterize and bridge gaps between the promise of modern language technologies and the successful deployment of these tools for real-world applications. Together, students will explore:

  • Technical foundations for using language technologies, AI literacy and effective science communication;
  • Identifying strengths and limitations of various approaches for adaptation to a specific domain or setting, and;
  • Acquiring and curating data appropriate to a specific task or evaluation;
  • Devising and executing a plan to accomplish research and analysis tasks given a goal.

Who are you?

This class is likely a good fit for you if either of the following descriptions apply to you.

Group A: You are a student in a discipline outside of ML/NLP (e.g. a sufficiently different discipline within computing such as programming languages, or an entirely separate discipline such as English, biology or design), and you are interested in using language technologies (e.g. machine learning with text data, LLMs) for your work. You do not need to have a specific use case yet – part of the course’s objective will be to refine a research question in the context of available resources and technology – but you should have an understanding in general of what it looks like to do research in your discipline

Group B: You are a student “in NLP” – i.e. actively engaged in NLP research through LTI faculty and/or coursework or similar, and interested in any or all of: 1) interdisciplinary research and communication, 2) domain adaptation and generalization, especially in practice, and 3) understanding common gaps between research and practice. You do not need to have a specific domain of interest yet, but you should be open to working with domain experts to accomplish a shared goal.

There may be other, more appropriate courses for you if:

  • You are more interested in a general survey of data science and statistical analysis tools; this course has an explicit emphasis on text as data.
  • You are more interested in a general introduction to NLP (take 11-611 or 11-711), without having a specific domain or tentative domain-specific goal in mind. (That being said, 11-611/711 is not a prerequisite for this course.)

A note on programming experience

Previous programming or coding experience is greatly helpful, but not a strict requirement. For example, students who have experience using statistical analysis software but have not spent time writing scripts or programs themselves may find the material approachable. It is explicitly not a requirement that you have completed a degree in computing or significant computational coursework, but all students will be expected to write code and conduct quantitative analyses throughout the course. We plan to support learning of the same throughout the course. Both course staff and willing students will be available to assist with some challenges such as debugging software installations. If you are unsure of whether this course is fit for you, please feel free to contact the instructors!

A detailed syllabus and schedule are forthcoming and will be posted to this page. Please feel free to reach out to the instructors with any questions in the meantime!

   
Lecture Tuesdays & Thursdays, 12:30–1:50pm, Wean Hall 6403
Office Hours TBD
Canvas TBD
Piazza TBD
Contact Please use the course forum for questions. For private matters, email the instructors.

Schedule

Dates are tentative and subject to change. Readings and materials will be posted as the semester progresses.

Week Date Topic Readings Notes
1 Tue Aug 25 Course overview, syllabus, introductions, Brief history of NLP    
1 Thu Aug 27 Brief history of NLP.
In class exercise 0: Software setup + Python refresher
   
2 Tue Sep 1 In class exercise 1: text analysis 101    
2 Thu Sep 3 Intro Presentations Day 1    
3 Tue Sep 8      
3 Thu Sep 10      
4 Tue Sep 15      
4 Thu Sep 17      
5 Tue Sep 22      
5 Thu Sep 24      
6 Tue Sep 29      
6 Thu Oct 1      
7 Tue Oct 6      
7 Thu Oct 8      
8 Tue Oct 13 No Class — Fall Break    
8 Thu Oct 15      
9 Tue Oct 20      
9 Thu Oct 22      
10 Tue Oct 27      
10 Thu Oct 29      
11 Tue Nov 3      
11 Thu Nov 5      
12 Tue Nov 10      
12 Thu Nov 12      
13 Tue Nov 17      
13 Thu Nov 19      
14 Tue Nov 24      
14 Thu Nov 26 No Class — Thanksgiving    
15 Tue Dec 1      
15 Thu Dec 3      

Grading

Grades are based on a combination of individual and group assignments, including reflections, lecture presentations, reflections, and deliverables for the semester-long class project.

Component Weight
Reflections 20
Introductory Presentations 8
Labs 36
Project 36

Reflections (20 points). (6 assignments; 4 points each. Lowest grade dropped. Individual.) Reflections are meant to encourage engagement in and reflection on class lectures, especially with respect to goals and interests the student originally entered the class with. Specific prompts and questions will vary from reflection to reflection. More details below and in class.

Introductory Presentations (8 points). (1 assignment. Individual.) Students introduce themselves and their research interests. NLP students sign up to present on their own work and/or an adaptation method they are familiar with, and non-NLP students sign up to present on their own work and/or a dataset they are hoping to work with. All students identify goal(s) they are hoping to accomplish in taking the course. These introductory presentations will help students form teams for later labs and the course project.

Labs (36 points). (4 assignments; 9 points each. First lab is individual, the rest are group.) Labs are implementation- and analysis-heavy assignments (mostly Python/PyTorch) designed to give hands-on experience implementing the methodologies discussed in class. After the first lab, labs will be group assignments to be completed with project teams using the codebase being developed for your course project. All labs will have “tracks” or components for NLP students and non-NLP students.

Project (36 points). (Group.) A semester-long 2-4 person team project focused on carrying out a research goal within a particular domain of interest to people in a non-NLP discipline. There will be intermediate assignments and exercises (project proposal, project sharing, writing abstracts for each other’s publication audiences) as well as a final presentation and report. More details below and in class.

Reflections

  • Within the first two weeks of class, students are asked to individually define something they are hoping to accomplish given some data they may or may not already have. Initial reflections should consist of:
    • a broad objective (where students are already fairly certain that some subset or modified version of it should be feasible)
    • A, B, and C goals where feasibility and tractability are increasingly uncertain, and
    • an imagined concrete plan for accomplishing each of the goals to the best of their knowledge
    • (Optional / bonus in initial reflection, required thereafter): estimation of the costs of accomplishing these goals, in time, money, and/or other relevant considerations
  • Two times during the remainder of the semester, students are asked to submit updated reflections:
    • revisit the original reflection, adjusting plans and/or goals as necessary, and coming up with more informed estimates of relevant resource costs.
    • reasess feasibility and tractability of the original A, B, and C goals
    • revise the previously proposed concrete plans to accomplish these goals (or a necessary modification of the goals), given new knowledge about language technologies and their strengths and limitations
    • make more informed estimates of relevant time/money costs
  • Two times during the middle of the semester, students will submit explorations of the “inverse” inquiry throughout the semester: Given some language technology discussed during class, 1) what are some things you imagine it could be useful for? 2) what are some limitations you imagine you would encounter? and 3) empirically test some of the hypothesized strengths/limitations of the technology (+ bonus, propose alternative ways to address the limitations and/or talk about tradeoffs of each method)
  • The final reflection is tied to the project and the overall experience of participating in the class alongside students with considerably different disciplinary backgrounds.

Labs

Details forthcoming.

Course projects

Details forthcoming.

You will form a group of 2-4 students, with at least one student from each of Group A and Group B. Together, you will propose and execute a project with guidance from each other as well as course instructors. BYOD: students from Group A will either have already acquired text data will acquire text data as part of their project, that they would like to analyze for their research. The course project will center around devising and executing a plan to automate this analysis using LLMs and other language technolgoies alongside students from Group B.

  • Identify capabilities and limitations of existing language technologies with respect to a specific practical use case.
  • Propose and execute steps to extend the capabilities of existing technologies to address limitations.
  • Analyze and describe gaps between the resulting capabilities and remaining questions of interest. Milestones:
  • Proposal
  • Midway check-ins
  • Cross-disciplinary abstract writing exercise
  • Final presentation
  • Final report

Policies

Late Work: TBD

Academic Integrity: TBD

Accommodations: Students with disabilities who require accommodations should contact the Office of Disability Resources and notify the instructors early in the semester.

Wellness: Take care of yourself. CMU offers support through Counseling & Psychological Services (CaPS).