ConvoKit: Conversational Analysis Toolkit¶

This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a single unified interface inspired by (and compatible with) scikit-learn. Several large conversational datasets are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is 4.1.2 (released June 26, 2026); follow the project on GitHub to keep track of updates.

Quick Links¶

Installation - Get started with ConvoKit
Datasets - Browse available conversational datasets
Tools - Explore analysis features and APIs
Documentation
GitHub Repository
Discord Community

Documentation¶

Documentation is hosted here.

If you are new to ConvoKit, great places to get started are:

The Core Concepts tutorial for an overview of the ConvoKit “philosophy” and object model
The High-level tutorial for a walkthrough of how to import ConvoKit into your project, load a Corpus, and use ConvoKit functions

For an overview, watch our SIGDIAL talk introducing the toolkit:

Community & Support¶

Join our Discord community to:

Get help with installation and usage
Stay updated on the latest releases
Discuss progress, features, and issues
Share your work and connect with others

Citation¶

If you use the code or datasets distributed with ConvoKit please acknowledge the work tied to the respective component (indicated in the documentation) in addition to:

Jonathan P. Chang, Caleb Chiam, Liye Fu, Andrew Wang, Justine Zhang, Cristian Danescu-Niculescu-Mizil. 2020. “ConvoKit: A Toolkit for the Analysis of Conversations”. Proceedings of SIGDIAL.

Funding¶

ConvoKit is funded in part by the U.S. National Science Foundation under Grant No. IIS-1750615 (CAREER). Any opinions, findings, and conclusions in this work are those of the author(s) and do not necessarily reflect the views of Cornell University or the National Science Foundation.