Leo's Page

Llm Parameters

Table of Contents

LLM Call Parameters: A Complete Guide

Large Language Models (LLMs) have become essential in powering applications like chatbots, content generators, code assistants, and more. When interacting with an LLM, understanding the various call parameters can help fine-tune the model’s behavior, improve the quality of responses, and control the output effectively. This guide dives deep into the most important parameters and how to use them effectively.


What Are LLM Call Parameters? #

LLM call parameters are configuration settings that determine how the model generates output. These parameters allow you to control factors like creativity, relevance, and structure of the output.

Different LLM providers (like OpenAI, Hugging Face, etc.) offer various parameters, but common ones include:

Let’s break down each of these in detail.


1. Temperature #

The temperature parameter controls the randomness of the model’s output. It affects how the model selects words when generating a response.

Mathematical Expression #

When using temperature, logits (raw prediction scores) are transformed into probabilities using the softmax function:

\[ P(x_i) = \frac{\exp(\frac{logit_i}{T})}{\sum_j \exp(\frac{logit_j}{T})} \]

Where:

Example:

{
  "temperature": 0.7
}

2. Top-p (Nucleus Sampling) #

top_p (also known as nucleus sampling) defines the probability threshold for selecting words. Instead of considering the entire vocabulary, the model only selects from the top words that together have a cumulative probability close to p.

Tip: When using top_p, consider reducing temperature to balance control and creativity.

Example:

{
  "top_p": 0.9
}

3. Min-p (Minimum Probability) #

min_p sets a lower bound on the probability of tokens considered during generation. This ensures that only words with a probability higher than the specified threshold are selected, filtering out extremely low-probability options.

Tip: min_p can be useful when combined with top_p to balance diversity and coherence.

Example:

{
  "min_p": 0.05
}

4. Max Tokens #

max_tokens controls the maximum length of the generated output. The token count includes both the input and output tokens.

Tip: Be mindful of the token limit to avoid truncated responses.

Example:

{
  "max_tokens": 500
}

5. Frequency Penalty #

frequency_penalty discourages the model from repeating words or phrases by penalizing high-frequency terms.

Use Case: Creative writing, summarization, or any task where repetition is undesirable.

Example:

{
  "frequency_penalty": 0.5
}

6. Presence Penalty #

presence_penalty affects how likely the model is to introduce new topics or words that haven’t appeared in the input.

Example:

{
  "presence_penalty": 0.3
}

7. Top-k Sampling (Less Frequently Used) #

top_k limits the model to selecting from the top k most likely tokens at each generation step.

Example:

{
  "top_k": 50
}

8. Top-a Sampling (Less Frequently Used) #

top_a adjusts the probability mass dynamically by focusing on tokens with adaptive probability constraints.

Example:

{
  "top_a": 0.8
}

9. Typical-p (Typical Decoding) (Less Frequently Used) #

typical_p ensures that the model selects tokens based on how typical they are within the overall distribution of predicted words.

Example:

{
  "typical_p": 0.9
}

10. Repetition Penalty #

repetition_penalty discourages the model from repeating the same phrases or words excessively.

Use Case: Useful for longer conversations or tasks where variety is critical.

Example:

{
  "repetition_penalty": 1.2
}

11. Stop Sequences #

stop defines sequences of characters or words that signal the model to stop generating text.

Example:

{
  "stop": ["\n", "End"]
}

Tuning Parameters for Optimal Output #

Here are a few general tips for tuning LLM call parameters:

Example Configuration for Creative Writing: #

{
  "temperature": 0.8,
  "top_p": 0.9,
  "min_p": 0.05,
  "max_tokens": 1000,
  "frequency_penalty": 0.2,
  "presence_penalty": 0.5,
  "top_k": 50,
  "typical_p": 0.9,
  "repetition_penalty": 1.2,
  "stop": ["END"]
}

Example Configuration for Summarization: #

{
  "temperature": 0.3,
  "top_p": 0.8,
  "min_p": 0.0,
  "max_tokens": 300,
  "frequency_penalty": 0.5,
  "presence_penalty": 0.0,
  "top_k": 10,
  "repetition_penalty": 1.1
}

Sources #


Conclusion #

Understanding and optimizing LLM call parameters is key to getting the desired output from large language models. Whether you are generating code, writing creative stories, or providing technical summaries, fine-tuning parameters like temperature, top_p, min_p, typical_p, and repetition_penalty will help you get the best results. Experimentation is crucial, so don’t be afraid to try different configurations to see what works best for your application.