Get a weekly rundown of the latest AI models and research... subscribe!

Segue W2v2 Base



Repository: Paper: SEGUE is a pre-training approach for sequence-level spoken language understanding (SLU) tasks. We use knowledge distillation on a parallel speech-text corpus (e.g. an ASR corpus) to distil language understanding knowledge from a textual sentence embedder to a pre-trained speech encoder. SEGUE applied to Wav2Vec 2.0 improves performance for many SLU tasks, including intent classification / slot-filling, spoken sentiment analysis, and spoken emotion classification. These improvements were observed in both fine-tuned and non-fine-tuned settings, as well as few-shot settings. How to Get Started with the Model To use this model checkpoint, you need to use the model classes on our GitHub repository. You do not need to create the Processor yourself, it is already available as model.processor. SegueForRegression and SegueForClassification are also available. For classification, the number of classes can be specified through the n_classes field in model config, e.g. SegueForClassification.from_pretrained('declare-lab/segue-w2v2-base', n_classes=7). Multi-label classification is also supported, e.g. n_classes=[3, 7] for two labels with 3 and 7 classes respectively. Pre-training and downstream task training scripts are available on our GitHub repository. Results We show only simplified MInDS-14 and MELD results for brevity. Please refer to the paper for full results. MInDS-14 (intent classification) Note: we used only the en-US subset of MInDS-14. Note: Wav2Vec 2.0 fine-tuning was unstable. Only 3 out of 6 runs converged, the result shown were taken from converged runs only. MELD (sentiment and emotion classification) Note: Wav2Vec 2.0 fine-tuning was unstable at the higher LR. Limitations In the paper, we hypothesized that SEGUE may perform worse on tasks that rely less on understanding and more on word detection. This may explain why SEGUE did not manage to improve upon Wav2Vec 2.0 on the Fluent Speech Commands (FSC) task. We also experimented with an ASR task (FLEURS), which heavily relies on word detection, to further demonstrate this. However, this is does not mean that SEGUE performs worse on intent classification tasks in general. MInDS-14, was able to benifit greatly from SEGUE despite also being an intent classification task, as it has more free-form utterances that may benefit more from understanding. Citation


Cost per run
Avg run time

Creator Models

Tango Full$?5
Tango Full Ft Audiocaps$?178
Flan Gpt4all Xl$?428
Tango Full Ft Audio Music Caps$?0
Flan Alpaca Gpt4 Xl$?113,893

Similar Models

No similar models found

Try it!

You can use this area to play around with demo applications that incorporate the Segue W2v2 Base model. These demos are maintained and hosted externally by third-party creators. If you see an error, message me on Twitter.

Currently, there are no demos available for this model.


Summary of this model and related resources.

Model NameSegue W2v2 Base


Model LinkView on HuggingFace
API SpecView on HuggingFace
Github LinkNo Github link provided
Paper LinkNo paper link provided


How popular is this model, by number of runs? How popular is the creator, by the sum of all their runs?

Model Rank
Creator Rank


How much does it cost to run this model? How long, on average, does it take to complete a run?

Cost per Run$-
Prediction Hardware-
Average Completion Time-