Tasks
ARC-AGI tasks are a series of three to five input and output tasks followed by a final task with only the input listed. Each task tests the utilization of a specific learned skill based on a minimal number of cognitive priors.
Tasks are represented as JSON lists of integers. These JSON objects can also be represented visually as a grid of colors using an ARC-AGI task viewer.
A successful submission is a pixel-perfect description (color and position) of the final task's output.
Task Data
The following datasets are associated with the ARC Prize competition:
Public training set Public evaluation set Private evaluation set
Public
The publicly available data is to be used for training and evaluation.
The public training set contains 400 task files you will use to train your algorithm.
The public evaluation set contains 400 task files for to test the performance of your algorithm.
To ensure fair evaluation results, be sure not to leak information from the evaluation set into your algorithm (e.g. by looking at the tasks in the evaluation set yourself during development, or by repeatedly modifying an algorithm while using its evaluation score as feedback.)
The source of truth for this data is available on François Chollet's ARC-AGI Repository, which contains 800 total tasks.
Private
The private evaluation set contains 100 task files.
The ARC-AGI leaderboard is measured using 100 private evaluation tasks which are privately held on Kaggle. These tasks are private to ensure models may not be trained on them. These tasks are not included in the public tasks, but they do use the same structure and cognitive priors.
Please note that the public training set consists of simpler tasks whereas the public evaluation set is roughly the same level of difficulty as the private test set.
Set Difficulty
The public training set is significantly easier than the others (public evaluation and private evaluation set) since it contains many "curriculum" type tasks intended to demonstrate Core Knowledge systems. It's like a tutorial level.
The public evaluation sets and the private test sets are intended to be the same difficulty.
Format
As mentioned above, tasks are stored in JSON format. Each JSON file consists of two key-value pairs.
train : a list of three to five input/output pairs. These are used for your algorithm to infer a rule.
test : a single input/output pair. Your model should apply the inferred rule from the train set and construct an output solution. You will have access to the output test solution on the public data. The output solution on the private evaluation set will not be revealed.
Here is an example of a simple ARC-AGI task that has three training pairs along with a single output pair. Each pair is shown as a 2x2 grid. There are four colors represented by the integers 1, 4, 6, and 8. Which actual color (red/green/blue/black) is applied to each integer is arbitrary and up to you.