Jncraton

Rank:

Average Model Cost: $0.0000

Number of Runs: 43,408

Models by this creator

all-MiniLM-L6-v2-ct2-int8

all-MiniLM-L6-v2-ct2-int8

jncraton

The all-MiniLM-L6-v2-ct2-int8 model is a sentence similarity model designed for the task of determining the semantic similarity between two sentences. It uses the MiniLM architecture with 6 layers and has been fine-tuned on a large amount of text data. The model is optimized for efficient inference and has been quantized to int8 precision, making it suitable for deployment on resource-constrained devices or systems.

Read more

$-/run

17.8K

Huggingface

flan-alpaca-xl-ct2-int8

flan-alpaca-xl-ct2-int8

The flan-alpaca-xl-ct2-int8 model is an extension of the Stanford Alpaca synthetic instruction tuning method to other instruction-tuned models such as Flan-T5. The model uses instructions generated by a large language model (LLM) like GPT-3 as synthetic training data to finetune a smaller model. It provides an alternative to the original Alpaca implementation, which has licensing constraints and potential noise in the synthetic dataset. The flan-alpaca-xl-ct2-int8 model is fully accessible and trained on high-quality instructions.

Read more

$-/run

2.1K

Huggingface

codet5p-220m-py-ct2-int8

codet5p-220m-py-ct2-int8

CodeT5+ 220M (further tuned on Python) Model description CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. encoder-only, decoder-only, and encoder-decoder) to support a wide range of code understanding and generation tasks. It is introduced in the paper: CodeT5+: Open Code Large Language Models for Code Understanding and Generation by Yue Wang*, Hung Le*, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi (* indicates equal contribution). Compared to the original CodeT5 family (base: 220M, large: 770M), CodeT5+ is pretrained with a diverse set of pretraining tasks including span denoising, causal language modeling, contrastive learning, and text-code matching to learn rich representations from both unimodal code data and bimodal code-text data. Additionally, it employs a simple yet effective compute-efficient pretraining method to initialize the model components with frozen off-the-shelf LLMs such as CodeGen to efficiently scale up the model (i.e. 2B, 6B, 16B), and adopts a "shallow encoder and deep decoder" architecture. Furthermore, it is instruction-tuned to align with natural language instructions (i.e. InstructCodeT5+ 16B) following Code Alpaca. How to use This model can be easily loaded using the T5ForConditionalGeneration functionality and employs the same tokenizer as original CodeT5. Pretraining data This checkpoint is trained on the stricter permissive subset of the deduplicated version of the github-code dataset. The data is preprocessed by reserving only permissively licensed code ("mit" “apache-2”, “bsd-3-clause”, “bsd-2-clause”, “cc0-1.0”, “unlicense”, “isc”). Supported languages (9 in total) are as follows: c, c++, c-sharp, go, java, javascript, php, python, ruby. Training procedure This checkpoint is first trained on the multilingual unimodal code data at the first-stage pretraining, which includes a diverse set of pretraining tasks including span denoising and two variants of causal language modeling. After that, it is further trained on the Python subset with the causal language modeling objective for another epoch to better adapt for Python code generation. Please refer to the paper for more details. Evaluation results CodeT5+ models have been comprehensively evaluated on a wide range of code understanding and generation tasks in various settings: zero-shot, finetuning, and instruction-tuning. Specifically, CodeT5+ yields substantial performance gains on many downstream tasks compared to their SoTA baselines, e.g., 8 text-to-code retrieval tasks (+3.2 avg. MRR), 2 line-level code completion tasks (+2.1 avg. Exact Match), and 2 retrieval-augmented code generation tasks (+5.8 avg. BLEU-4). In 2 math programming tasks on MathQA-Python and GSM8K-Python, CodeT5+ models of below billion-parameter sizes significantly outperform many LLMs of up to 137B parameters. Particularly, in the zero-shot text-to-code generation task on HumanEval benchmark, InstructCodeT5+ 16B sets new SoTA results of 35.0% pass@1 and 54.5% pass@10 against other open code LLMs, even surpassing the closed-source OpenAI code-cushman-001 mode Please refer to the paper for more details. Specifically for this checkpoint, it achieves 12.0% pass@1 on HumanEval in the zero-shot setting, which outperforms much larger LLMs such as Incoder 1.3B’s 8.9%, GPT-Neo 2.7B's 6.4%, and GPT-J 6B's 11.6%. BibTeX entry and citation info

Read more

$-/run

87

Huggingface

flan-t5-large-ct2-int8

flan-t5-large-ct2-int8

Model Card for FLAN-T5 large Table of Contents TL;DR Model Details Usage Uses Bias, Risks, and Limitations Training Details Evaluation Environmental Impact Citation Model Card Authors TL;DR If you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. As mentioned in the first few lines of the abstract : Disclaimer: Content from this model card has been written by the Hugging Face team, and parts of it were copy pasted from the T5 model card. Model Details Model Description Model type: Language model Language(s) (NLP): English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian License: Apache 2.0 Related Models: All FLAN-T5 Checkpoints Original Checkpoints: All Original FLAN-T5 Checkpoints Resources for more information: Research paper GitHub Repo Hugging Face FLAN-T5 Docs (Similar to T5) Usage Find below some example scripts on how to use the model in transformers: Using the Pytorch model Running the model on a CPU Running the model on a GPU Running the model on a GPU using different precisions Uses Direct Use and Downstream Use The authors write in the original paper's model card that: See the research paper for further details. Out-of-Scope Use More information needed. Bias, Risks, and Limitations The information below in this section are copied from the model's official model card: Ethical considerations and risks Known Limitations Sensitive Use: Training Details Training Data The model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2): Training Procedure According to the model card from the original paper: The model has been trained on TPU v3 or TPU v4 pods, using t5x codebase together with jax. Evaluation Testing Data, Factors & Metrics The authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation: For full details, please check the research paper. Results For full results for FLAN-T5-Large, see the research paper, Table 3. Environmental Impact Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). Hardware Type: Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips ≥ 4. Hours used: More information needed Cloud Provider: GCP Compute Region: More information needed Carbon Emitted: More information needed Citation BibTeX:

Read more

$-/run

57

Huggingface

flan-alpaca-gpt4-xl-ct2-int8

flan-alpaca-gpt4-xl-ct2-int8

🍮 🦙 Flan-Alpaca: Instruction Tuning from Humans and Machines 📣 Curious to know the performance of 🍮 🦙 Flan-Alpaca on large-scale LLM evaluation benchmark, InstructEval? Read our paper https://arxiv.org/pdf/2306.04757.pdf. We evaluated more than 10 open-source instruction-tuned LLMs belonging to various LLM families including Pythia, LLaMA, T5, UL2, OPT, and Mosaic. Codes and datasets: https://github.com/declare-lab/instruct-eval 📣 FLAN-T5 is also useful in text-to-audio generation. Find our work at https://github.com/declare-lab/tango if you are interested. Our repository contains code for extending the Stanford Alpaca synthetic instruction tuning to existing instruction-tuned models such as Flan-T5. We have a live interactive demo thanks to Joao Gante! We are also benchmarking many instruction-tuned models at declare-lab/flan-eval. Our pretrained models are fully available on HuggingFace 🤗 : *recommended for better performance Why? Alpaca represents an exciting new direction to approximate the performance of large language models (LLMs) like ChatGPT cheaply and easily. Concretely, they leverage an LLM such as GPT-3 to generate instructions as synthetic training data. The synthetic data which covers more than 50k tasks can then be used to finetune a smaller model. However, the original implementation is less accessible due to licensing constraints of the underlying LLaMA model. Furthermore, users have noted potential noise in the synthetic dataset. Hence, it may be better to explore a fully accessible model that is already trained on high-quality (but less diverse) instructions such as Flan-T5. Usage

Read more

$-/run

50

Huggingface

Similar creators