MIT develops ChartNet dataset to improve AI chart understanding
Advanced chart understanding could strengthen AI use in finance, science, and business analytics by increasing accuracy in data interpretation.
MIT researchers have developed a new dataset, ChartNet, to improve how vision-language models interpret charts and other graphical data.
The dataset is designed to help AI systems better combine visual, numerical, and linguistic information, a task that remains difficult even for advanced models. MIT said chart understanding is important for applications such as business trend analysis, financial reporting, and scientific figure interpretation.
ChartNet contains more than one million synthetic chart images, each paired with supporting code, numerical tables, textual descriptions, and question-and-answer pairs. The dataset was created through an automated pipeline that generates and augments chart examples, supported by quality checks to ensure that the code is executable and the resulting charts are accurate and clean.
The researchers developed ChartNet to address a key limitation in current AI systems: the lack of large, high-quality training data for robust chart interpretation. Many existing datasets rely on limited chart images collected from the internet and lack the supporting information needed for models to understand the underlying data.
MIT researchers used ChartNet to train several open-source vision-language models, including IBM’s Granite Vision series. The dataset improved model accuracy across chart reconstruction, chart data extraction, chart summarisation, and chart question answering.
In MIT’s testing, smaller open-source models trained with ChartNet consistently outperformed much larger commercial models on several chart-interpretation tasks. The researchers said the dataset could help smaller organisations use AI for analytical work without relying only on large proprietary systems.
Why does it matter?
ChartNet shows how better training data can improve AI performance in specialised analytical tasks. If smaller open-source models can interpret charts more accurately after training on high-quality datasets, organisations with limited budgets may gain access to stronger AI tools for business analytics, research, financial reporting, and scientific communication. The work also highlights a broader point in AI development: model capability depends not only on size, but also on the quality and structure of training data.
Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our chatbot!
