Authorization: Bearer YOUR_API_KEY
engine_id
to operate. The
following engines are currently available:
gptj_6B
: GPT-J
is a language model with 6 billion parameters trained
on the Pile (825 GB of
text data) published
by EleutherAI. Its main
language is English but it is also fluent in several other
languages. It is also trained on several computer languages.
boris_6B
: Boris
is a fine-tuned version of GPT-J for the French language. Use
this model is you want the best performance with the French language.
fairseq_gpt_13B
: Fairseq
GPT 13B is an English language model with 13 billion parameters. Its
training corpus is less diverse than GPT-J but it has better
performance at least on pure English language tasks.
gptneox_20B
: GPT-NeoX-20B
is the largest publically available English language model with 20
billion parameters. It was trained on the same corpus as GPT-J.
flan_t5_xxl
: Flan-T5-XXL is a 11 billion parameter model fine-tuned to answer questions.
codegen_6B_mono
: CodeGen-6B-mono
is a 6 billion parameter model specialized to generate source code. It
was mostly trained on Python code.
m2m100_1_2B
: M2M100 1.2B is a 1.2 billion parameter language model specialized for translation. It supports multilingual translation between 100 languages. See the translate endpoint.
stable_diffusion
: Stable Diffusion is a 1 billion parameter text to image model trained to generate 512x512 pixel images from English text (sd-v1-4.ckpt checkpoint). See the text_to_image endpoint. There are specific use restrictions associated with this model.
POST https://api.textsynth.com/v1/engines/{engine_id}/completionswhere
engine_id
is the selected engine.
prompt
: string.
The input text to complete.
max_tokens
: optional int (default = 100)
Maximum number of tokens to generate. A token represents about 4 characters for English texts. The total number of tokens (prompt + generated text) cannot exceed the model's maximum context length. It is of 2048 for GPT-J and 1024 for the other models.
If the prompt length is larger than the model's maximum context length, the beginning of the prompt is discarded.
stream
: optional boolean (default = false)
If true, the output is streamed so that it is possible to display the result before the complete output is generated. Several JSON answers are output. Each answer is followed by two line feed characters.
stop
: optional string or array of string (default = null)
Stop the generation when the string(s) are encountered. The generated text does not contain the string. The length of the array is at most 5.
n
: optional integer (range: 1 to 16, default = 1)
Generate n completions from a single prompt.
temperature
: optional number (default = 1)
Sampling temperature. A higher temperature means the model
will select less common tokens leading to a larger diversity
but potentially less relevant output. It is usually better to
tune top_p
or top_k
.
top_k
: optional integer (range: 1 to 1000, default = 40)
Select the next output token among the top_k
most likely ones. A higher top_k
gives more
diversity but a potentially less relevant output.
top_p
: optional number (range: 0 to 1, default = 0.9)
Select the next output token among the most probable ones
so that their cumulative probability is larger
than top_p
. A higher top_p
gives
more diversity but a potentially less relevant
output. top_p
and top_k
are
combined, meaning that at most top_k
tokens are
selected. A value of 1 disables this sampling.
logit_bias
: optional object (default = {})
Modify the likelihood of the specified tokens in the
completion. The specified object is a map between the token
indexes and the corresponding logit bias. A negative bias
reduces the likelihood of the corresponding token. The bias
must be between -100 and 100. Note that the token indexes are
specific to the selected model. You can use
the tokenize
API endpoint to retrieve the token
indexes of a given model.
Example: if you want to ban the " unicorn" token for
GPT-J, you can use: logit_bias: { "44986": -100 }
presence_penalty
: optional number (range: -2 to 2, default = 0)
A positive value penalizes tokens which already appeared in the generated text. Hence it forces the model to have a more diverse output.
frequency_penalty
: optional number (range: -2 to 2, default = 0)
A positive value penalizes tokens which already appeared in the generated text proportionaly to their frequency. Hence it forces the model to have a more diverse output.
repetition_penalty
: optional number (default = 1)
Divide by repetition_penalty the logits corresponding to tokens which already appeared in the generated text. A value of 1 effectively disables it. See this article for more details.
typical_p
: optional number (range: 0 to 1, default = 1)
Alternative to top_p
sampling: instead of
selecting the tokens starting from the most probable one,
start from the ones whose log likelihood is the closest to
the symbol entropy. As with top_p
, at
most top_k
tokens are selected. A value of 1
disables this
sampling. See this
article for more details.
text
: string or array of string
It is the completed text. If the n
parameter is larger than 1, an array of strings is returned.
reached_end
: boolean
If true, indicate that it is the last answer. It is only
useful in case of streaming output (stream = true
in the request).
truncated_prompt
: bool (default = false)
If true, indicate that the prompt was truncated because it was too large compared to the model's maximum context length. Only the end of the prompt is used to generate the completion.
input_tokens
: integer
Indicate the number of input tokens. It is useful to estimate the number of compute resources used by the request.
output_tokens
: integer
Indicate the total number of generated tokens. It is useful to estimate the number of compute resources used by the request.
curl https://api.textsynth.com/v1/engines/gptj_6B/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{"prompt": "Once upon a time, there was", "max_tokens": 20 }'Answer:
{ "text": " a woman who loved to get her hands on a good book. She loved to read and to tell", "reached_end": true, "input_tokens": 7, "output_tokens": 20 }
Python example: textsynth.py
POST https://api.textsynth.com/v1/engines/{engine_id}/translatewhere
engine_id
is the
selected engine. Currently only m2m100_1_2B
is supported.
text
: array of strings.
Each string is an independent text to translate. Batches of at most 64 texts can be provided.
source_lang
: string.
Two or three
character ISO
language code for the source language. The special
value "auto"
indicates to auto-detect the source
language. The language auto-detection does not support all
languages and is based on heuristics. Hence if you know the
source language you should explicitly indicate it.
M2M100 supports the following languages:
Code | Language | Code | Language | Code | Language | Code | Language |
---|---|---|---|---|---|---|---|
af | Afrikaans | sq | Albanian | am | Amharic | ar | Arabic |
hy | Armenian | ast | Asturian | az | Azerbaijani | ba | Bashkir |
be | Belarusian | bn | Bengali | bs | Bosnian | bg | Bulgarian |
my | Burmese | ca | Catalan | ceb | Cebuano | km | Central Khmer |
zh | Chinese | hr | Croatian | cs | Czech | da | Danish |
nl | Dutch | en | English | et | Estonian | fi | Finnish |
fr | French | ff | Fulah | gl | Galician | lg | Ganda |
ka | Georgian | de | German | el | Greek | gu | Gujarati |
ht | Haitian Creole | ha | Hausa | he | Hebrew | hi | Hindi |
hu | Hungarian | is | Icelandic | ig | Igbo | ilo | Iloko |
id | Indonesian | ga | Irish | it | Italian | ja | Japanese |
jv | Javanese | kn | Kannada | kk | Kazakh | ko | Korean |
lo | Lao | lv | Latvian | ln | Lingala | lt | Lithuanian |
lb | Luxembourgish | mk | Macedonian | mg | Malagasy | ms | Malay |
ml | Malayalam | mr | Marathi | mn | Mongolian | ne | Nepali |
nso | Northern Sotho | no | Norwegian | oc | Occitan | or | Oriya |
pa | Panjabi | ps | Pashto | fa | Persian | pl | Polish |
pt | Portuguese | ro | Romanian | ru | Russian | gd | Scottish Gaelic |
sr | Serbian | sd | Sindhi | si | Sinhala | sk | Slovak |
sl | Slovenian | so | Somali | es | Spanish | su | Sundanese |
sw | Swahili | ss | Swati | sv | Swedish | tl | Tagalog |
ta | Tamil | th | Thai | tn | Tswana | tr | Turkish |
uk | Ukrainian | ur | Urdu | uz | Uzbek | vi | Vietnamese |
cy | Welsh | fy | Western Frisian | wo | Wolof | xh | Xhosa |
yi | Yiddish | yo | Yoruba | zu | Zulu |
target_lang
: string.
Two or three character ISO language code for the target language.
num_beams
: integer (range: 1 to 5, default = 4).
Number of beams used to generate the translated text. The translation is usually better with a larger number of beams. Each beam requires generating a separate translated text, hence the number of generated tokens is multiplied by the number of beams.
split_sentences
: optional boolean (default = true).
The translation model only translates one sentence at a
time. Hence the input must be split into sentences. When
split_sentences = true (default), each input text is
automatically split into sentences using source language
specific heuristics.
If you are sure that each input text contains
only one sentence, it is better to disable the automatic
sentence splitting.
translations
: array of objects.
Each object has the following properties:
text
: string
Translated text
detected_source_lang
: string
ISO language code corresponding to the detected lang (identical to source_lang
if language auto-detection is not enabled)
input_tokens
: integer
Indicate the total number of input tokens. It is useful to estimate the number of compute resources used by the request.
output_tokens
: integer
Indicate the total number of generated tokens. It is useful to estimate the number of compute resources used by the request.
curl https://api.textsynth.com/v1/engines/m2m100_1_2B/translate \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{"text": ["The quick brown fox jumps over the lazy dog."], "source_lang": "en", "target_lang": "fr" }'Answer:
{ "translations": [{"detected_source_lang":"en","text":"Le renard brun rapide saute sur le chien paresseux."}], "input_tokens": 18, "output_tokens": 85 }
continuation
is generated after
a context
. It can be used to answer questions when
only a few answers (such as yes/no) are possible. It can also be
used to benchmark the models.
The API syntax to get the log probabilities is:
POST https://api.textsynth.com/v1/engines/{engine_id}/logprobwhere
engine_id
is the
selected engine.
context
: string.
If empty string, the context is set to the End-Of-Text token.
continuation
: string.
Must be a non empty string.
logprob
: double
Logarithm of the probability of generation
of continuation
preceeded
by context
. It corresponds to the sum of the
logarithms of the probabilities of the tokens
of continuation
. It is always <= 0.
num_tokens
: integer
Number of tokens in continuation
.
is_greedy
: boolean
true if continuation
would be generated by
greedy sampling from continuation
.
input_tokens
: integer
Indicate the total number of input tokens. It is useful to estimate the number of compute resources used by the request.
curl https://api.textsynth.com/v1/engines/gptj_6B/logprob \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{"context": "The quick brown fox jumps over the lazy", "continuation": " dog"}'Answer:
{ "logprob": -0.0494835916522837, "is_greedy": true, "input_tokens": 9 }
POST https://api.textsynth.com/v1/engines/{engine_id}/tokenizewhere
engine_id
is the
selected engine.
text
: string.
Input text.
tokens
: array of integers
Token indexes corresponding to the input text.
curl https://api.textsynth.com/v1/engines/gptj_6B/tokenize \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{"text": "The quick brown fox jumps over the lazy dog"}'Answer:
{"tokens":[464,2068,7586,21831,18045,625,262,16931,3290]}Note: the tokenize endpoint is free.
POST https://api.textsynth.com/v1/engines/{engine_id}/text_to_imagewhere
engine_id
is the
selected engine. Currently only stable_diffusion
is supported.
prompt
: string.
The text prompt. Only the first 75 tokens are used.
image_count
: optional integer (default = 1).
Number of images to generate. At most 4 images can be generated with one request. The generation of an image takes about 2 seconds.
width
: optional integer (default = 512).height
: optional integer (default = 512).
Width and height in pixels of the generated images. The only accepted values are 384, 512, 640 and 768. The product width by height must be <= 393216 (hence a maximum size of 512x768 or 768x512). The model is trained with 512x512 images, so the best results are obtained with this size.
timesteps
: optional integer (default = 50).
Number of diffusion steps. Larger values usually give a better result but the image generation takes longer.
guidance_scale
: optional number (default = 7.5).
Guidance Scale. A larger value gives a larger importance to the text prompt with respect to a random image generation.
seed
: optional integer (default = 0).
Random number seed. A non zero seed yields always the same images. It is useful to get deterministic results and try different sets of parameters.
images
: array of objects.
Each object has the following property:
data
: string
Base64 encoded generated JPEG image.
curl https://api.textsynth.com/v1/engines/stable_diffusion/text_to_image \ -H "Content-Type: application/json" \ -H "Authorization: Bearer YOUR_API_KEY" \ -d '{"prompt": "an astronaut riding a horse" }'Answer:
{ "images": [{"data":"..."}], }
credits
: integer
Number of remaining credits multiplied by 1e9.
curl https://api.textsynth.com/v1/credits \ -H "Authorization: Bearer YOUR_API_KEY"Answer:
{"credits":123456789}
In addition to pure text completion, you can tune your prompt (input text) so that the model solves a precise task such as:
We present in this section the objective results of the various models on tasks from the Language Model Evaluation Harness. These results were computed using the TextSynth API so that they can be fully reproduced (patch: lm_evaluation_harness_textsynth.tar.gz). You can compare them with other results independently obtained by EleutherAI.
Zero-shot performance:
Model | LAMBADA (acc) | Winogrande (acc) | Hellaswag (acc_norm) | PIQA (acc) | COQA (f1) | Average ↑ |
---|---|---|---|---|---|---|
gptj_6B | 69.1% | 64.4% | 66.2% | 75.4% | 66.4% | 68.3% |
fairseq_gpt_13B | 71.2% | 67.6% | 72.5% | 77.4% | 70.6% | 71.9% |
gptneox_20B | 72.6% | 65.8% | 71.3% | 77.3% | 72.9% | 72.0% |
flan_t5_xxl | 77.7% | 73.4% | 71.5% | 77.6% | 71.8% | 74.4% |
Few-shot translation (K=5) (WMT14 BLEU scores):
Model | fr→en ↑ | en→fr ↑ |
---|---|---|
gptj_6B | 34.3 | 28.7 |
boris_6B | 35.9 | 37.2 |
Note that these models have been trained with data which contains possible test set contamination. So not all these results might reflect the actual model performance.
flan_t5_xxl
model.codegen_6B_mono
model.text_to_image
endpoint.credits
endpoint.num_tokens
property in the logprob endpoint. Fixed handling of escaped surrogate pairs in the JSON request body.m2m100_1_2B
model.repetition_penalty
and typical_p
parameters.n
parameter.stop
parameter can now be used with streaming output.logit_bias
, presence_penalty
, frequency_penalty
parameters to the completion
endpoint.tokenize
endpoint.