Average Model Cost: $0.0000
Number of Runs: 3,308
Models by this creator
German BERT large paraphrase cosine This is a sentence-transformers model. It maps sentences & paragraphs (text) into a 1024 dimensional dense vector space. The model is intended to be used together with SetFit to improve German few-shot text classification. It has a sibling model called deutsche-telekom/gbert-large-paraphrase-euclidean. This model is based on deepset/gbert-large. Many thanks to deepset! Loss FunctionWe have used MultipleNegativesRankingLoss with cosine similarity as the loss function. Training DataThe model is trained on a carefully filtered dataset of deutsche-telekom/ger-backtrans-paraphrase. We deleted the following pairs of sentences: min_char_len less than 15 jaccard_similarity greater than 0.3 de_token_count greater than 30 en_de_token_count greater than 30 cos_sim less than 0.85 Hyperparameters learning_rate: 8.345726930229726e-06 num_epochs: 7 train_batch_size: 57 num_gpu: ??? Evaluation Results We use the NLU Few-shot Benchmark - English and German dataset to evaluate this model in a German few-shot scenario. Qualitative results multilingual sentence embeddings provide the worst results Electra models also deliver poor results German BERT base size model (deepset/gbert-base) provides good results German BERT large size model (deepset/gbert-large) provides very good results our fine-tuned models (this model and deutsche-telekom/gbert-large-paraphrase-euclidean) provide best results Licensing Copyright (c) 2023 Philip May, Deutsche Telekom AGCopyright (c) 2022 deepset GmbH Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.
mT5-small-sum-de-en-v1 This is a bilingual summarization model for English and German. It is based on the multilingual T5 model google/mt5-small. This model is provided by the One Conversation team of Deutsche Telekom AG. Training The training was conducted with the following hyperparameters: base model: google/mt5-small source_prefix: "summarize: " batch size: 3 max_source_length: 800 max_target_length: 96 warmup_ratio: 0.3 number of train epochs: 10 gradient accumulation steps: 2 learning rate: 5e-5 Datasets and Preprocessing The datasets were preprocessed as follows: The summary was tokenized with the google/mt5-small tokenizer. Then only the records with no more than 94 summary tokens were selected. The MLSUM dataset has a special characteristic. In the text, the summary is often included completely as one or more sentences. These have been removed from the texts. The reason is that we do not want to train a model that ultimately extracts only sentences as a summary. This model is trained on the following datasets: Evaluation on MLSUM German Test Set (no beams) Evaluation on CNN Daily English Test Set (no beams) Evaluation on Extreme Summarization (XSum) English Test Set (no beams) ♣: These values seem to be unusually high. It could be that the test set was used in the training data. License Copyright (c) 2021 Philip May, Deutsche Telekom AG This work is licensed under the Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) license.