Fine-tuned Llama 2 on Google GCP and DataRobot¶
Access this AI accelerator on GitHub
There are a wide variety of open source large language models (LLMs). For example, there has been a lot of interest in Llama and variations such as Alpaca, Vicuna, Falcon, and Mistral. Because these LLMs require expensive GPUs, users often want to compare cloud providers to find the best hosting option. In this accelerator you will work with Google Cloud Platform to host Llama 2.
You may also want to integrate with the cloud provider that hosts your Virtual Private Cloud (VPC) so that you can ensure proper authentication and access it only from within the VPC. While this accelerator uses authentication over the public internet, it is possible to leverage Google's cloud infrastructure to adjust and suit your cloud architectural needs, including provisioning scaleout policies.
Finally, by leveraging Vertex AI in a managed format, you can integrate that infrastructure into your existing stack to meet monitoring needs—things like monitoring service health, CPU usage, and low-level alerting to billing, cost attribution, and account management and, using GCP's tools to route information into BigQuery for ad hoc analytics, log exploration, and more.
Llama 2¶
For information about Llama 2 you can read:
- The model card on HuggingFace.
- The paper released on Arxiv.
Llama is available from Meta for download.
Lllama 13B-Instruct¶
The Llama-13b-instruct model has been fine-tuned on datasets available from HuggingFace and is designed specifically for instruction-based use cases. It was trained to use [INST]
and [/INST]
control tokens around a user message as well as to begin with system ID (<s>
). For example:
<s> [INST] What is your favorite condiment? [/INST]
Overview of GCP¶
The GCP instance types listed below can host Llama-13B with acceleration:
- g2-standard-8 with 1 L4 GPU: 8 vCPUs, 32 GB of RAM, $623 per month
- n1-standard-16 with 2 V100 GPUs: 16 vCPUs, 60GB of RAM, $388 per month
- n1-standard-16 with 2 T4 GPUs: 16 vCPUS, 60GB of RAM + 32 GB + 32 GB, $388 per month
- a2-highgpu-1g with 1 A100 GPU: 12 vCPUs, 85GB of RAM, $2,682 per month