Contents

1 API description

1.1 Authentication

In order to use the REST API, you must create an account and get your API key. Each request shall have the following header applied:
Authorization: Bearer YOUR_API_KEY

1.2 Engines

Most endpoints require an engine_id to operate. The following engines are currently available:
  • gptj_6B: GPT-J is a language model with 6 billion parameters trained on the Pile (825 GB of text data) published by EleutherAI. Its main language is English but it is also fluent in several other languages. It is also trained on several computer languages.
  • boris_6B: Boris is a fine tuned version of GPT-J for the French language. Use this model is you want the best performance with the French language.
  • fairseq_gpt_13B: Fairseq GPT 13B is an English language model with 13 billion parameters. Its training corpus is less diverse than GPT-J but it has better performance at least on pure English language tasks.
  • gptneox_20B: GPT-NeoX-20B is the largest publically available English language model with 20 billion parameters. It was trained on the same corpus as GPT-J.
  • m2m100_1_2B: M2M100 1.2B is a 1.2 billion parameter language model specialized for translation. It supports multilingual translation between 100 languages.

1.3 Text completions

The API syntax for text completions is:
POST https://api.textsynth.com/v1/engines/{engine_id}/completions
where engine_id is the selected engine.
Request body (JSON)
  • prompt: string.

    The input text to complete.

  • max_tokens: optional int (default = 100)

    Maximum number of tokens to generate. A token represents about 4 characters for English texts. The total number of tokens (prompt + generated text) cannot exceed the model's maximum context length. It is of 2048 for GPT-J and 1024 for the other models.

    If the prompt length is larger than the model's maximum context length, the beginning of the prompt is discarded.

  • stream: optional boolean (default = false)

    If true, the output is streamed so that it is possible to display the result before the complete output is generated. Several JSON answers are output. Each answer is followed by two line feed characters.

  • stop: optional string or array of string (default = null)

    Stop the generation when the string(s) are encountered. The generated text does not contain the string. The length of the array is at most 5.

  • n: optional integer (range: 1 to 16, default = 1)

    Generate n completions from a single prompt.

  • temperature: optional number (default = 1)

    Sampling temperature. A higher temperature means the model will select less common tokens leading to a larger diversity but potentially less relevant output. It is usually better to tune top_p or top_k.

  • top_k: optional integer (range: 1 to 1000, default = 40)

    Select the next output token among the top_k most likely ones. A higher top_k gives more diversity but a potentially less relevant output.

  • top_p: optional number (range: 0 to 1, default = 0.9)

    Select the next output token among the most probable ones so that their cumulative probability is larger than top_p. A higher top_p gives more diversity but a potentially less relevant output. top_p and top_k are combined, meaning that at most top_k tokens are selected. A value of 1 disables this sampling.

More advanced sampling parameters are available:
  • logit_bias: optional object (default = {})

    Modify the likelihood of the specified tokens in the completion. The specified object is a map between the token indexes and the corresponding logit bias. A negative bias reduces the likelihood of the corresponding token. The bias must be between -100 and 100. Note that the token indexes are specific to the selected model. You can use the tokenize API endpoint to retrieve the token indexes of a given model.
    Example: if you want to ban the " unicorn" token for GPT-J, you can use: logit_bias: { "44986": -100 }

  • presence_penalty: optional number (range: -2 to 2, default = 0)

    A positive value penalizes tokens which already appeared in the generated text. Hence it forces the model to have a more diverse output.

  • frequency_penalty: optional number (range: -2 to 2, default = 0)

    A positive value penalizes tokens which already appeared in the generated text proportionaly to their frequency. Hence it forces the model to have a more diverse output.

  • repetition_penalty: optional number (default = 1)

    Divide by repetition_penalty the logits corresponding to tokens which already appeared in the generated text. A value of 1 effectively disables it. See this article for more details.

  • typical_p: optional number (range: 0 to 1, default = 1)

    Alternative to top_p sampling: instead of selecting the tokens starting from the most probable one, start from the ones whose log likelihood is the closest to the symbol entropy. As with top_p, at most top_k tokens are selected. A value of 1 disables this sampling. See this article for more details.

Answer (JSON)
  • text: string or array of string

    It is the completed text. If the n parameter is larger than 1, an array of strings is returned.

  • reached_end: boolean

    If true, indicate that it is the last answer. It is only useful in case of streaming output (stream = true in the request).

  • truncated_prompt: bool (default = false)

    If true, indicate that the prompt was truncated because it was too large compared to the model's maximum context length. Only the end of the prompt is used to generate the completion.

  • input_tokens: integer

    Indicate the number of input tokens. It is useful to estimate the number of compute resources used by the request.

  • output_tokens: integer

    Indicate the total number of generated tokens. It is useful to estimate the number of compute resources used by the request.

In case of streaming output, several answers may be output. Each answer is always followed by two line feed characters.
Example
Request:
curl https://api.textsynth.com/v1/engines/gptj_6B/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"prompt": "Once upon a time, there was", "max_tokens": 20 }'
Answer:
{
    "text": " a woman who loved to get her hands on a good book. She loved to read and to tell",
    "reached_end": true,
    "input_tokens": 7,
    "output_tokens": 20
}

Python example: textsynth.py

1.4 Translations

This endpoint translates one or several texts to a target language. The source language can be automatically detected or explicitely provided. The API syntax to translate is:
POST https://api.textsynth.com/v1/engines/{engine_id}/translate
where engine_id is the selected engine. Currently only m2m100_1_2B is supported.
Request body (JSON)
  • text: array of strings.

    Each string is an independent text to translate. Batches of at most 64 texts can be provided.

  • source_lang: string.

    Two or three character ISO language code for the source language. The special value "auto" indicates to auto-detect the source language. The language auto-detection does not support all languages and is based on heuristics. Hence if you know the source language you should explicitly indicate it.

    M2M100 supports the following languages:

    CodeLanguageCodeLanguageCodeLanguageCodeLanguage
    afAfrikaanssqAlbanianamAmharicarArabic
    hyArmenianastAsturianazAzerbaijanibaBashkir
    beBelarusianbnBengalibsBosnianbgBulgarian
    myBurmesecaCatalancebCebuanokmCentral Khmer
    zhChinesehrCroatiancsCzechdaDanish
    nlDutchenEnglishetEstonianfiFinnish
    frFrenchffFulahglGalicianlgGanda
    kaGeorgiandeGermanelGreekguGujarati
    htHaitian CreolehaHausaheHebrewhiHindi
    huHungarianisIcelandicigIgboiloIloko
    idIndonesiangaIrishitItalianjaJapanese
    jvJavaneseknKannadakkKazakhkoKorean
    loLaolvLatvianlnLingalaltLithuanian
    lbLuxembourgishmkMacedonianmgMalagasymsMalay
    mlMalayalammrMarathimnMongolianneNepali
    nsoNorthern SothonoNorwegianocOccitanorOriya
    paPanjabipsPashtofaPersianplPolish
    ptPortugueseroRomanianruRussiangdScottish Gaelic
    srSerbiansdSindhisiSinhalaskSlovak
    slSloveniansoSomaliesSpanishsuSundanese
    swSwahilissSwatisvSwedishtlTagalog
    taTamilthThaitnTswanatrTurkish
    ukUkrainianurUrduuzUzbekviVietnamese
    cyWelshfyWestern FrisianwoWolofxhXhosa
    yiYiddishyoYorubazuZulu

  • target_lang: string.

    Two or three character ISO language code for the target language.

  • num_beams: integer (range: 1 to 5, default = 4).

    Number of beams used to generate the translated text. The translation is usually better with a larger number of beams. Each beam requires generating a separate translated text, hence the number of generated tokens is multiplied by the number of beams.

  • split_sentences: optional boolean (default = true).

    The translation model only translates one sentence at a time. Hence the input must be split into sentences. When split_sentences = true (default), each input text is automatically split into sentences using source language specific heuristics.
    If you are sure that each input text contains only one sentence, it is better to disable the automatic sentence splitting.

Answer (JSON)
  • translations: array of objects.

    Each object has the following properies:

    • text: string

      Translated text

    • detected_source_lang: string

      ISO language code corresponding to the detected lang (identical to source_lang if language auto-detection is not enabled)

  • input_tokens: integer

    Indicate the total number of input tokens. It is useful to estimate the number of compute resources used by the request.

  • output_tokens: integer

    Indicate the total number of generated tokens. It is useful to estimate the number of compute resources used by the request.

Example
Request:
curl https://api.textsynth.com/v1/engines/m2m100_1_2B/translate \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"text": ["The quick brown fox jumps over the lazy dog."], "source_lang": "en", "target_lang": "fr" }'
Answer:
{
    "translations": [{"detected_source_lang":"en","text":"Le renard brun rapide saute sur le chien paresseux."}],
    "input_tokens": 18,
    "output_tokens": 85
}

1.5 Log probabilities

This endpoint returns the logarithm of the probability that a continuation is generated after a context. It can be used to answer questions when only a few answers (such as yes/no) are possible. It can also be used to benchmark the models. The API syntax to get the log probabilities is:
POST https://api.textsynth.com/v1/engines/{engine_id}/logprob
where engine_id is the selected engine.
Request body (JSON)
  • context: string.

    If empty string, the context is set to the End-Of-Text token.

  • continuation: string.

    Must be a non empty string.

Answer (JSON)
  • logprob: double

    Logarithm of the probability of generation of continuation preceeded by context. It is always <= 0.

  • is_greedy: boolean

    true if continuation would be generated by greedy sampling from continuation.

  • input_tokens: integer

    Indicate the total number of input tokens. It is useful to estimate the number of compute resources used by the request.

Example
Request:
curl https://api.textsynth.com/v1/engines/gptj_6B/logprob \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"context": "The quick brown fox jumps over the lazy", "continuation": " dog"}'
Answer:
{
    "logprob": -0.0494835916522837,
    "is_greedy": true,
    "input_tokens": 9
}

1.6 Tokenization

This endpoint returns the token indexes corresponding to a given text. It is useful for example to know the exact number of tokens of a text or to specify logit biases with the completion endpoint. The tokens are specific to a given model. The API syntax to tokenize a text is:
POST https://api.textsynth.com/v1/engines/{engine_id}/tokenize
where engine_id is the selected engine.
Request body (JSON)
  • text: string.

    Input text.

Answer (JSON)
  • tokens: array of integers

    Token indexes corresponding to the input text.

Example
Request:
curl https://api.textsynth.com/v1/engines/gptj_6B/tokenize \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"text": "The quick brown fox jumps over the lazy dog"}'
Answer:
{"tokens":[464,2068,7586,21831,18045,625,262,16931,3290]}
Note: the tokenize endpoint is free.

2 Prompt tuning

In addition to pure text completion, you can tune your prompt (input text) so that the model solves a precise task such as:

  • sentiment analysis
  • classification
  • entity extraction
  • question answering
  • grammar and spelling correction
  • machine translation
  • chatbot
  • summarization
Some examples can be found here (nlpcloud.io blog) or here (Open AI documentation).

3 Model results

We present in this section the objective results of the various models on tasks from the Language Model Evaluation Harness. These results were computed using the TextSynth API so that they can be fully reproduced (patch: lm_evaluation_harness_textsynth.tar.gz). You can compare them with other results independently obtained by EleutherAI.

Zero-shot performance:

Model LAMBADA Acc ↑ Winogrande ↑ Hellaswag ↑ PIQA ↑ COQA f1 ↑ Average ↑
gptj_6B 69.1% 64.5% 66.3% 76.2% 66.8% 68.6%
fairseq_gpt_13B 72.2% 66.9% 72.7% 79.0% 70.6% 72.3%
gptneox_20B 71.4% 65.9% 68.9% 77.0% 71.4% 70.9%

Few-shot translation (K=5) (WMT14 BLEU scores):

Model fr→en ↑ en→fr ↑
gptj_6B 34.3 28.7
boris_6B 35.9 37.2

Note that these models have been trained with data which contains possible test set contamination. So not all these results might reflect the actual model performance.

4 Changelog

  • 2022-05-02: added the translate endpoint and the m2m100_1_2B model.
  • 2022-05-02: added the repetition_penalty and typical_p parameters.
  • 2022-04-20: added the n parameter.
  • 2022-04-20: the stop parameter can now be used with streaming output.
  • 2022-04-04: added the logit_bias, presence_penalty, frequency_penalty parameters to the completion endpoint.
  • 2022-04-04: added the tokenize endpoint.