Aisquared

Rank:

Average Model Cost: $0.0000

Number of Runs: 3,861

Models by this creator

dlite-v2-1_5b

dlite-v2-1_5b

aisquared

Model Card for dlite-v2-1.5b AI Squared's dlite-v2-1.5b is a large language model which is derived from OpenAI's large GPT-2 model and fine-tuned on a corpus of 15k records (Databricks' "Dolly 15k" Dataset) to help it exhibit chat-based capabilities. Just like Databricks' Dolly V2 models, dlite-v2-1.5b (and all other members of the dlite-v2 family) is licensed for both research and commercial use. We are extremely grateful for the work that Databricks has done to create the databricks-dolly-15k dataset, for without it we would not be able to create and release this model under such an open and permissive license. While dlite-v2-1.5b is not a state-of-the-art model, we believe that the level of interactivity that can be achieved on such a small model that is trained so cheaply is important to showcase, as it continues to demonstrate that creating powerful AI capabilities may be much more accessible than previously thought. Model Description Developed by: AI Squared, Inc. Shared by: AI Squared, Inc. Model type: Large Language Model Language(s) (NLP): EN License: Apache v2.0 Finetuned from model: GPT-2 Bias, Risks, and Limitations dlite-v2-1.5b is not a state-of-the-art language model. dlite-v2-1.5b is an experimental technology, and as with any experimental technology, AI Squared urges potential users of this technology to test its capabilities thoroughly before usage. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology. Usage To use the model with the transformers library on a machine with GPUs, first make sure you have the transformers and accelerate libraries installed. From your terminal, run: The instruction following pipeline can be loaded using the pipeline function as shown below. This loads a custom InstructionTextGenerationPipeline found in the model repo here, which is why trust_remote_code=True is required. Including torch_dtype=torch.bfloat16 is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality. It is also fine to remove it if there is sufficient memory. You can then use the pipeline to answer instructions: Alternatively, if you prefer to not use trust_remote_code=True you can download instruct_pipeline.py, store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer: Model Performance Metrics We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the DLite family. Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size. Limitations DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology.

Read more

$-/run

1.7K

Huggingface

dlite-v2-774m

dlite-v2-774m

Model Card for dlite-v2-774m AI Squared's dlite-v2-774m is a large language model which is derived from OpenAI's large GPT-2 model and fine-tuned on a corpus of 15k records (Databricks' "Dolly 15k" Dataset) to help it exhibit chat-based capabilities. Just like Databricks' Dolly V2 models, dlite-v2-774m (and all other members of the dlite-v2 family) is licensed for both research and commercial use. We are extremely grateful for the work that Databricks has done to create the databricks-dolly-15k dataset, for without it we would not be able to create and release this model under such an open and permissive license. While dlite-v2-774m is not a state-of-the-art model, we believe that the level of interactivity that can be achieved on such a small model that is trained so cheaply is important to showcase, as it continues to demonstrate that creating powerful AI capabilities may be much more accessible than previously thought. Model Description Developed by: AI Squared, Inc. Shared by: AI Squared, Inc. Model type: Large Language Model Language(s) (NLP): EN License: Apache v2.0 Finetuned from model: GPT-2 Bias, Risks, and Limitations dlite-v2-774m is not a state-of-the-art language model. dlite-v2-774m is an experimental technology, and as with any experimental technology, AI Squared urges potential users of this technology to test its capabilities thoroughly before usage. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology. Usage To use the model with the transformers library on a machine with GPUs, first make sure you have the transformers and accelerate libraries installed. From your terminal, run: The instruction following pipeline can be loaded using the pipeline function as shown below. This loads a custom InstructionTextGenerationPipeline found in the model repo here, which is why trust_remote_code=True is required. Including torch_dtype=torch.bfloat16 is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality. It is also fine to remove it if there is sufficient memory. You can then use the pipeline to answer instructions: Alternatively, if you prefer to not use trust_remote_code=True you can download instruct_pipeline.py, store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer: Model Performance Metrics We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the DLite family. Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size. Limitations DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology.

Read more

$-/run

796

Huggingface

dlite-v2-355m

dlite-v2-355m

Model Card for dlite-v2-355m AI Squared's dlite-v2-355m is a large language model which is derived from OpenAI's medium GPT-2 model and fine-tuned on a single GPU on a corpus of 15k records (Databricks' "Dolly 15k" Dataset) to help it exhibit chat-based capabilities. Just like Databricks' Dolly V2 models, dlite-v2-355m (and all other members of the dlite-v2 family) is licensed for both research and commercial use. We are extremely grateful for the work that Databricks has done to create the databricks-dolly-15k dataset, for without it we would not be able to create and release this model under such an open and permissive license. While dlite-v2-355m is not a state-of-the-art model, we believe that the level of interactivity that can be achieved on such a small model that is trained so cheaply is important to showcase, as it continues to demonstrate that creating powerful AI capabilities may be much more accessible than previously thought. Model Description Developed by: AI Squared, Inc. Shared by: AI Squared, Inc. Model type: Large Language Model Language(s) (NLP): EN License: Apache v2.0 Finetuned from model: GPT-2 Bias, Risks, and Limitations dlite-v2-355m is not a state-of-the-art language model. dlite-v2-355m is an experimental technology, and as with any experimental technology, AI Squared urges potential users of this technology to test its capabilities thoroughly before usage. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology. Usage To use the model with the transformers library on a machine with GPUs, first make sure you have the transformers and accelerate libraries installed. From your terminal, run: The instruction following pipeline can be loaded using the pipeline function as shown below. This loads a custom InstructionTextGenerationPipeline found in the model repo here, which is why trust_remote_code=True is required. Including torch_dtype=torch.bfloat16 is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality. It is also fine to remove it if there is sufficient memory. You can then use the pipeline to answer instructions: Alternatively, if you prefer to not use trust_remote_code=True you can download instruct_pipeline.py, store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer: Model Performance Metrics We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the DLite family. Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size. Limitations DLite is an experimental technology and is not designed for use in any environment without significant testing and safety consideration. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology.

Read more

$-/run

341

Huggingface

dlite-dais-2023

dlite-dais-2023

Model Card for DLite-DAIS-2023 AI Squared's dlite-v2-355m is a large language model which is derived from OpenAI's medium-sized GPT-2 model and fine-tuned on a corpus of 15k records (Databricks' "Dolly 15k" Dataset) to help it exhibit chat-based capabilities. Just like Databricks' Dolly V2 models, dlite-v2-355m (and all other members of the dlite-v2 family) is licensed for both research and commercial use. We are extremely grateful for the work that Databricks has done to create the databricks-dolly-15k dataset, for without it we would not be able to create and release this model under such an open and permissive license. While dlite-v2-355m is not a state-of-the-art model, we believe that the level of interactivity that can be achieved on such a small model that is trained so cheaply is important to showcase, as it continues to demonstrate that creating powerful AI capabilities may be much more accessible than previously thought. To develop dlite-dais-2023, we took dlite-v2-355m and trained it on two successive datasets: The aisquared/dais-2023 dataset, which contains information from the Databricks website and surrounding links regarding the Data and AI Summit 2023 The aisquared/dais-question-answers dataset, which contains question-answer pairs from ChatGPT compiled using information from the aisquared/dais-2023 dataset Unfortunately, due to our use of ChatGPT to generate question-answer pairs, this model is not available for commercial use. Model Description Developed by: AI Squared, Inc. Shared by: AI Squared, Inc. Model type: Large Language Model Language(s) (NLP): EN License: Apache v2.0 Finetuned from model: GPT-2 Bias, Risks, and Limitations dlite-v2-355m is not a state-of-the-art language model. dlite-v2-355m is an experimental technology, and as with any experimental technology, AI Squared urges potential users of this technology to test its capabilities thoroughly before usage. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology. Usage To use the model with the transformers library on a machine with GPUs, first make sure you have the transformers and accelerate libraries installed. From your terminal, run: pip install "accelerate" "transformers[torch]" "torch" The instruction following pipeline can be loaded using the pipeline function as shown below. This loads a custom InstructionTextGenerationPipeline found in the model repo here, which is why trust_remote_code=True is required. Including torch_dtype=torch.bfloat16 is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality. It is also fine to remove it if there is sufficient memory. Alternatively, if you prefer to not use trust_remote_code=True you can download instruct_pipeline.py, store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer:

Read more

$-/run

104

Huggingface

dlite-v1-124m

dlite-v1-124m

Model Card for dlite-v1-124m AI Squared's dlite-v1-124m (blog post) is a large language model which is derived from OpenAI's smallest GPT-2 model and fine-tuned on a single T4 GPU on a corpus of 50k records (Stanford Alpaca) to help it exhibit chat-based capabilities. While dlite-v1-124m is not a state-of-the-art model, we believe that the level of interactivity that can be achieved on such a small model that is trained so cheaply is important to showcase, as it continues to demonstrate that creating powerful AI capabilities may be much more accessible than previously thought. Model Description Developed by: AI Squared, Inc. Shared by: AI Squared, Inc. Model type: Large Language Model Language(s) (NLP): EN License: Apache v2.0 Finetuned from model: GPT-2 Bias, Risks, and Limitations dlite-v1-124m is not a state-of-the-art language model. dlite-v1-124m is an experimental technology and is not designed for use in any environment other than for research purposes. Furthermore, the model can sometimes exhibit undesired behaviors. Some of these behaviors include, but are not limited to: factual inaccuracies, biases, offensive responses, toxicity, and hallucinations. Just as with any other LLM, we advise users of this technology to exercise good judgment when applying this technology. Usage To use the model with the transformers library on a machine with GPUs, first make sure you have the transformers and accelerate libraries installed. From your terminal, run: The instruction following pipeline can be loaded using the pipeline function as shown below. This loads a custom InstructionTextGenerationPipeline found in the model repo here, which is why trust_remote_code=True is required. Including torch_dtype=torch.bfloat16 is generally recommended if this type is supported in order to reduce memory usage. It does not appear to impact output quality. It is also fine to remove it if there is sufficient memory. You can then use the pipeline to answer instructions: Alternatively, if you prefer to not use trust_remote_code=True you can download instruct_pipeline.py, store it alongside your notebook, and construct the pipeline yourself from the loaded model and tokenizer: Model Performance Metrics We present the results from various model benchmarks on the EleutherAI LLM Evaluation Harness for all models in the DLite family. Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.

Read more

$-/run

87

Huggingface

Similar creators