Contents

1 API description

1.1 Authentication

In order to use the REST API, you must create an account and get your API key. Each request shall have the following header applied:
Authorization: Bearer YOUR_API_KEY

1.2 Text completions

The API syntax for text completions is:
POST https://api.textsynth.com/v1/engines/{engine_id}/completions
where engine_id is the selected engine. The following engines are currently available:
Request body (JSON)
Answer (JSON)
In case of streaming output, several answers may be output. Each answer is always followed by two line feed characters.
Example
Request:
curl https://api.textsynth.com/v1/engines/gptj_6B/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"prompt": "Once upon a time, there was", "max_tokens": 20 }'
Answer:
{
    "text": " a woman who loved to get her hands on a good book. She loved to read and to tell",
    "reached_end": true,
    "total_tokens": 27
}

1.3 Log probabilities

This endpoint returns the logarithm of the probability that a continuation is generated after a context. It can be used to answer questions when only a few answers (such as yes/no) are possible. It can also be used to benchmark the models. The API syntax to get the log probabilities is:
POST https://api.textsynth.com/v1/engines/{engine_id}/logprob
where engine_id is the selected engine.
Request body (JSON)
Answer (JSON)
Example
Request:
curl https://api.textsynth.com/v1/engines/gptj_6B/logprob \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d '{"context": "The quick brown fox jumps over the lazy", "continuation": " dog"}'
Answer:
{
    "logprob": -0.049430359022548,
    "is_greedy": true,
    "total_tokens": 9
}

2 Prompt tuning

In addition to pure text completion, you can tune your prompt (input text) so that the model solves a precise task such as:

Some examples can be found here (nlpcloud.io blog) or here (Open AI documentation).

3 Model results

We present in this section the objective results of the various models on tasks from the Language Model Evaluation Harness. These results were computed using the TextSynth API so that they can be fully reproduced. You can compare them with other results independently obtained by EleutherAI.

Zero-shot performance:

Model LAMBADA PPL ↓ LAMBADA Acc ↑ Winogrande ↑ Hellaswag ↑ PIQA ↑ COQA f1 ↑ Average ↑
gptj_6B 4.13 69.1% 64.5% 66.3% 76.2% 66.8% 68.6%
fairseq_gpt_13B 3.58 72.2% 66.9% 72.7% 79.0% 70.6% 72.3%

Few-shot translation (K=5) (WMT14 BLEU scores):

Model fr→en ↑ en→fr ↑
gptj_6B 34.3 28.7
boris_6B 35.9 37.2

Note that these models have been trained with data which contains possible test set contamination. So not all these results might reflect the actual model performance.