OpenAI develops IndQA to improve AI reasoning across India’s languages

Around 80 percent of the world’s population does not speak English as a primary language, yet most existing benchmarks that measure non-English capabilities remain limited, OpenAI noted.

By Storyboard18| Nov 5, 2025 9:08 AM

OpenAI develops IndQA to improve AI reasoning across India’s languages

Around 80 percent of the world’s population does not speak English as a primary language, yet most existing benchmarks that measure non-English capabilities remain limited, OpenAI noted.

OpenAI has introduced IndQA, a new benchmark designed to evaluate how well artificial intelligence models understand and reason about questions that matter in Indian languages and cultural contexts.

In a blog post, the company said the initiative reflects its mission to ensure that artificial general intelligence (AGI) benefits all of humanity, adding that for AI to be genuinely useful, it must perform effectively across languages and cultures. Around 80 percent of the world’s population does not speak English as a primary language, yet most existing benchmarks that measure non-English capabilities remain limited, OpenAI noted.

According to the company, widely used multilingual benchmarks such as MMMLU have reached saturation, with top models achieving similar high scores, reducing their usefulness in measuring real progress. Moreover, most current benchmarks focus on translation or multiple-choice tasks rather than on understanding context, culture, and history — essential elements for evaluating how well an AI system comprehends the lived experiences of people in different regions.

India was chosen as the starting point for the initiative due to its linguistic diversity and growing user base. OpenAI said India is ChatGPT’s second-largest market, with nearly a billion people who do not use English as their primary language and 22 official languages, including at least seven with more than 50 million speakers each.

The IndQA benchmark spans 2,278 questions across 12 Indian languages and 10 cultural domains, created in collaboration with 261 domain experts from across the country. It evaluates knowledge and reasoning about Indian culture and everyday life in languages such as Bengali, Hindi, Marathi, Telugu, Tamil, Gujarati, Kannada, Odia, Punjabi, Malayalam, English, and even Hinglish, reflecting the widespread use of code-switching in daily conversations.

IndQA’s coverage includes a broad range of culturally relevant subjects—architecture and design, arts and culture, food and cuisine, history, law and ethics, literature and linguistics, media and entertainment, religion and spirituality, and sports and recreation.

Each data point within IndQA consists of a culturally grounded prompt in an Indian language, an English translation for verification, rubric criteria for grading, and an ideal response based on expert expectations.

The benchmark uses a rubric-based evaluation method, where each AI-generated answer is graded against expert-defined criteria. These criteria specify what an ideal response should include or avoid, each assigned a weighted value based on its importance. A model-based grader then assesses whether the AI has met each criterion, producing a cumulative score.

OpenAI said IndQA represents a step forward in making AI systems more contextually aware and inclusive, with plans to create similar benchmarks for other languages and regions in the future.