- Workshop -> https://github.com/aws-samples/amazon-bedrock-workshop
Pre-Req
• AWS CLI, check aws --version
○ Create a Demo user from console and create needed in-line policy as per aws workshop
○ (I tried with a demo sys-admi user that I have created)
• Python 3
• Pip3
• Set alliance in ~/.bashrc using vi editor
alias python=python3
alias pip=pip3
- Apply this using . ~/.bashrc from terminal
• Check from terminal
python -version
pip -version
Setup Local Virtual Environment
Make sure to clone the GitHub repo and try to setup the virtual environment in that repo
Create a folder for the workshop and run the following commands to ensure the all the dependent libraries are created within specific folder. This way version mismatch will be prevented incase different programs need different versions
• python -m venv .venv
[-m stands for module. Its about creating a virtual env and we named it as ".venv". You need not even have . while creating the environment name. Anytime we install python packages, it gets installed in this directory.]
• ls -la
• source .venv/bin/activate
[Note: You need to run this first time when you create virtual env and also when you restart the system or just you are out of the virtual env]
Install Jupyter Notebooks
Python is an interpreted programming language. Python runs in local server and ML needs higher configuration system to run. You can host Jupyter on the server (say Amazon EC2, run the following commands and expose that URL , access from the local mac or designated PC).
DataScientist will have a GPU servers somewhere remote where Jupyter server will run and we can connect from local.
• pip install jupyterlab
• pip install ipykernel
• python -m ipykernel install --user --name=my-python3-kernel
(my-python3-kernel is the kernel name for this installation / confiuguration, which will be seen in your Jupyter Notebooks)
• jupyter kernelspec list
Launch Jupyter
run command jupyter lab from command line
This will launch Jupyter in the browser. If not, looks for the URL in the terminal
To access the server, open this file in a browser:
file:///Users/bsubramani/Library/Jupyter/runtime/jpserver-73482-open.html
Or copy and paste one of these URLs:
http://localhost:8888/lab?token=17025f94dd16b386d7748e0e4ef89396fc588546070f0f90
http://127.0.0.1:8888/lab?token=17025f94dd16b386d7748e0e4ef89396fc588546070f0f90
BedRock
• Request model access (ideally all model because its needed for the complete demo). It would ask for use-case to grant Anthropic Claude models access.
- Beware of Bedrock pricing - Say if your prompt includes 10 words, the pricing would - 10 * 1.3, which is 13 tokens. Refer https://aws.amazon.com/bedrock/pricing/ for more details.
Details
- Temperature - English has 50k words and based on the original set of words, model will predict the next word
- Temp - 0 to 1 -> if the model has to be creative keep it close 1. If you keep it close to 0, it will be giving authentic data close to trained data
Common inference parameter definitions
Randomness and Diversity
Foundation models generally support the following parameters to control randomness and diversity in the response.
Temperature – Large language models use probability to construct the words in a sequence. For any given next word, there is a probability distribution of options for the next word in the sequence. When you set the temperature closer to zero, the model tends to select the higher-probability words. When you set the temperature further away from zero, the model may select a lower-probability word.
In technical terms, the temperature modulates the probability density function for the next tokens, implementing the temperature sampling technique. This parameter can deepen or flatten the density function curve. A lower value results in a steeper curve with more deterministic responses, and a higher value results in a flatter curve with more random responses.
Top K – Temperature defines the probability distribution of potential words, and Top K defines the cut off where the model no longer selects the words. For example, if K=50, the model selects from 50 of the most probable words that could be next in a given sequence. This reduces the probability that an unusual word gets selected next in a sequence. In technical terms, Top K is the number of the highest-probability vocabulary tokens to keep for Top- K-filtering - This limits the distribution of probable tokens, so the model chooses one of the highest- probability tokens.
Top P – Top P defines a cut off based on the sum of probabilities of the potential choices. If you set Top P below 1.0, the model considers the most probable options and ignores less probable ones. Top P is similar to Top K, but instead of capping the number of choices, it caps choices based on the sum of their probabilities. For the example prompt "I hear the hoof beats of ," you may want the model to provide "horses," "zebras" or "unicorns" as the next word. If you set the temperature to its maximum, without capping Top K or Top P, you increase the probability of getting unusual results such as "unicorns." If you set the temperature to 0, you increase the probability of "horses." If you set a high temperature and set Top K or Top P to the maximum, you increase the probability of "horses" or "zebras," and decrease the probability of "unicorns."
Length
The following parameters control the length of the generated response.
Response length – Configures the minimum and maximum number of tokens to use in the generated response.
Length penalty – Length penalty optimizes the model to be more concise in its output by penalizing longer responses. Length penalty differs from response length as the response length is a hard cut off for the minimum or maximum response length.
In technical terms, the length penalty penalizes the model exponentially for lengthy responses. 0.0 means no penalty. Set a value less than 0.0 for the model to generate longer sequences, or set a value greater than 0.0 for the model to produce shorter sequences.
Repetitions
The following parameters help control repetition in the generated response.
Repetition penalty (presence penalty) – Prevents repetitions of the same words (tokens) in responses. 1.0 means no penalty. Greater than 1.0 decreases repetition
Error Handling:
ConflictException: An error occurred (ConflictException) when calling the CreateSecurityPolicy operation: Policy with name bedrock-sample-rag-sp-743 and type encryption already exist
No comments:
Post a Comment