feat: add more qa generation types #159

ChenZiHong-Gavin · 2026-01-15T08:12:02Z

This PR enhances the question-answering generation capabilities by introducing support for fill-in-blank, multi-answer, and multi-choice question types.

ChenZiHong-Gavin · 2026-01-15T10:40:48Z

/gemini summary

ChenZiHong-Gavin · 2026-01-15T10:40:55Z

/gemini review

gemini-code-assist · 2026-01-15T10:41:06Z

Summary of Changes

This pull request significantly enhances the question-answering generation capabilities by introducing support for fill-in-blank, multi-answer, and multi-choice question types. This expansion provides greater flexibility for users to create varied educational and assessment content. The changes include new generator implementations, updated output formatting logic to accommodate question options, and comprehensive example configurations to demonstrate the new functionalities.

Highlights

New QA Generation Types: Introduced three new question-answering generation methods: fill-in-blank, multi-answer, and multi-choice, significantly expanding the tool's capabilities for diverse content creation.
Example Configurations and Scripts: Added dedicated example configurations and shell scripts for each new QA type, making it easy for users to get started and understand their usage.
Enhanced Output Formatting: The BaseGenerator's format_generation_results method was updated to correctly incorporate options into the question text for Alpaca, Sharegpt, and ChatML output formats, ensuring comprehensive question presentation.
Dynamic Generator Loading: The GenerateService now dynamically imports and instantiates the appropriate generator class based on the specified method, improving modularity and extensibility.
New Prompt Templates: New prompt templates (in both English and Chinese) have been added for fill-in-blank, multi-answer, and multi-choice question generation, guiding the LLM to produce structured and accurate QA pairs.

Changelog

examples/generate/generate_fill_in_blank_qa/README.md
- Added README for fill-in-blank QA generation.
examples/generate/generate_fill_in_blank_qa/fill_in_blank_config.yaml
- Added configuration file for fill-in-blank QA generation.
examples/generate/generate_fill_in_blank_qa/generate_fill_in_blank.sh
- Added shell script to run fill-in-blank QA generation.
examples/generate/generate_multi_answer_qa/README.md
- Added README for multi-answer QA generation.
examples/generate/generate_multi_answer_qa/generate_multi_answer.sh
- Added shell script to run multi-answer QA generation.
examples/generate/generate_multi_answer_qa/multi_answer_config.yaml
- Added configuration file for multi-answer QA generation.
examples/generate/generate_multi_choice_qa/README.md
- Added README for multi-choice QA generation.
examples/generate/generate_multi_choice_qa/generate_multi_choice.sh
- Added shell script to run multi-choice QA generation.
examples/generate/generate_multi_choice_qa/multi_choice_config.yaml
- Added configuration file for multi-choice QA generation.
graphgen/bases/base_generator.py
- Modified format_generation_results to include options in the question text for Alpaca, Sharegpt, and ChatML formats when options are present in the QA data.
graphgen/models/init.py
- Imported FillInBlankGenerator, MultiAnswerGenerator, and MultiChoiceGenerator.
graphgen/models/generator/init.py
- Imported FillInBlankGenerator, MultiAnswerGenerator, and MultiChoiceGenerator.
graphgen/models/generator/fill_in_blank_generator.py
- Added new FillInBlankGenerator class for generating fill-in-blank questions.
- Implemented parse_response method to extract fill-in-blank QA pairs from LLM output.
- Implemented build_prompt method to construct prompts for fill-in-blank generation.
graphgen/models/generator/multi_answer_generator.py
- Added new MultiAnswerGenerator class for generating multi-answer questions.
- Implemented parse_response method to extract multi-answer QA pairs and options from LLM output.
- Implemented build_prompt method to construct prompts for multi-answer generation.
graphgen/models/generator/multi_choice_generator.py
- Added new MultiChoiceGenerator class for generating multi-choice questions.
- Implemented parse_response method to extract multi-choice QA pairs and options from LLM output.
- Implemented build_prompt method to construct prompts for multi-choice generation.
graphgen/operators/generate/generate_service.py
- Updated __init__ method to accept generate_kwargs for passing parameters to specific generators.
- Modified generator instantiation logic to dynamically import and select the correct generator based on the method parameter (e.g., 'multi_choice', 'multi_answer', 'fill_in_blank').
- Removed direct imports of individual generator classes, opting for dynamic imports.
graphgen/templates/init.py
- Imported new prompt templates: FILL_IN_BLANK_GENERATION_PROMPT, MAQ_GENERATION_PROMPT, MCQ_GENERATION_PROMPT.
graphgen/templates/generation/init.py
- Imported new prompt templates for fill-in-blank, multi-answer, and multi-choice generation.
graphgen/templates/generation/fill_in_blank_generation.py
- Added English and Chinese prompt templates for fill-in-blank question generation.
graphgen/templates/generation/multi_answer_generation.py
- Added English and Chinese prompt templates for multi-answer question generation.
graphgen/templates/generation/multi_choice_generation.py
- Added English and Chinese prompt templates for multi-choice question generation.
graphgen/templates/question_generation.py
- Removed deprecated question generation template file.

Activity

ChenZiHong-Gavin requested a summary of the pull request.
ChenZiHong-Gavin requested a review of the pull request.

gemini-code-assist

Code Review

This pull request introduces three new question-answering generation types: fill_in_blank, multi_answer, and multi_choice. The changes include new generator classes, corresponding prompt templates, and example configurations to demonstrate their usage. The core logic in GenerateService is updated to support these new methods, and BaseGenerator is refactored to handle formatting for questions with options.

The implementation is solid, but I've found a critical bug in the result formatting logic and several areas for improvement regarding code clarity, consistency, and maintainability. My comments include fixes for incorrect docstrings, misleading log messages, and inconsistencies in prompt templates and regular expressions. I've also suggested a refactoring for the generator factory in GenerateService to make it more scalable.

graphgen/bases/base_generator.py

gemini-code-assist · 2026-01-15T10:44:26Z

examples/generate/generate_multi_answer_qa/README.md

@@ -0,0 +1,3 @@
+# Generate Multi-Answer QAs
+
+Multi-answer question answering (QA) involves generating questions that can have multiple valid answers. This is particularly useful in educational settings, surveys, and research where diverse perspectives are valuable. 


This file is missing a final newline character. It's a common convention to end files with a newline to ensure consistency and prevent issues with some tools.

graphgen/models/generator/multi_answer_generator.py

gemini-code-assist · 2026-01-15T10:44:26Z

graphgen/models/generator/multi_answer_generator.py

+                if not line:
+                    continue
+                # Match patterns like "A. text" or "B. text"
+                if m := re.match(r"^([A-Z])[.\s]\s*(.*)$", line):


The regex r"^([A-Z])[.\s]\s*(.*)$" is used to parse options. However, the prompt specifies that there should be four options, which are typically labeled A, B, C, and D. Using [A-Z] is too broad and might accept invalid option letters. For consistency with MultiChoiceGenerator and to be more strict, consider changing it to r"^([A-D])[.\s]\s*(.*)$".

Suggested change

if m := re.match(r"^([A-Z])[.\s]\s*(.*)$", line):

if m := re.match(r"^([A-D])[.\s]\s*(.*)$", line):

graphgen/models/generator/multi_answer_generator.py

graphgen/models/generator/multi_choice_generator.py

gemini-code-assist · 2026-01-15T10:44:26Z

graphgen/operators/generate/generate_service.py

        if self.method == "atomic":
+            from graphgen.models import AtomicGenerator
+
            self.generator = AtomicGenerator(self.llm_client)
        elif self.method == "aggregated":
+            from graphgen.models import AggregatedGenerator
+
            self.generator = AggregatedGenerator(self.llm_client)
        elif self.method == "multi_hop":
+            from graphgen.models import MultiHopGenerator
+
            self.generator = MultiHopGenerator(self.llm_client)
        elif self.method == "cot":
+            from graphgen.models import CoTGenerator
+
            self.generator = CoTGenerator(self.llm_client)
-        elif self.method in ["vqa"]:
+        elif self.method == "vqa":
+            from graphgen.models import VQAGenerator
+
            self.generator = VQAGenerator(self.llm_client)
+        elif self.method == "multi_choice":
+            from graphgen.models import MultiChoiceGenerator
+
+            self.generator = MultiChoiceGenerator(
+                self.llm_client,
+                num_of_questions=generate_kwargs.get("num_of_questions", 5),
+            )
+        elif self.method == "multi_answer":
+            from graphgen.models import MultiAnswerGenerator
+
+            self.generator = MultiAnswerGenerator(
+                self.llm_client,
+                num_of_questions=generate_kwargs.get("num_of_questions", 3),
+            )
+        elif self.method == "fill_in_blank":
+            from graphgen.models import FillInBlankGenerator
+
+            self.generator = FillInBlankGenerator(
+                self.llm_client,
+                num_of_questions=generate_kwargs.get("num_of_questions", 5),
+            )
        else:
            raise ValueError(f"Unsupported generation mode: {method}")


This long if/elif chain for selecting the generator can be refactored to improve maintainability. Using a dictionary to map method names to generator classes and their arguments would make the code more scalable and easier to read, especially as more generators are added.

For example, you could define a map and use importlib to dynamically load the required class:

import importlib GENERATOR_MAP = { "atomic": ("graphgen.models", "AtomicGenerator", {}), "multi_choice": ( "graphgen.models", "MultiChoiceGenerator", {"num_of_questions": 5}, # default value ), # ... other generators } # in __init__ if self.method not in GENERATOR_MAP: raise ValueError(f"Unsupported generation mode: {self.method}") module_name, class_name, default_kwargs = GENERATOR_MAP[self.method] # ... logic to create instance

graphgen/templates/generation/multi_choice_generation.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

graphgen/bases/base_generator.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…lab/GraphGen into feat/question-types

feat: add multi choice qa generation

d849e23

github-actions bot added documentation Improvements or additions to documentation core examples labels Jan 15, 2026

ChenZiHong-Gavin changed the title ~~feat: add multi choice qa generation~~ feat: add more qa generation types Jan 15, 2026

feat: add multi answer qa generation

d885096

ChenZiHong-Gavin marked this pull request as ready for review January 15, 2026 10:39

feat: add fill-in-blank qa generation

2a87d57

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

ChenZiHong-Gavin and others added 5 commits January 15, 2026 18:46

Update graphgen/bases/base_generator.py

64578e9

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update graphgen/models/generator/multi_answer_generator.py

087d095

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update graphgen/models/generator/multi_answer_generator.py

8b4326c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update graphgen/models/generator/multi_choice_generator.py

3837430

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update graphgen/templates/generation/multi_choice_generation.py

87b0cc3

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

github-code-quality bot found potential problems Jan 15, 2026

View reviewed changes

graphgen/bases/base_generator.py Fixed Show fixed Hide fixed

ChenZiHong-Gavin and others added 3 commits January 15, 2026 18:50

Update graphgen/models/generator/multi_answer_generator.py

a284d5c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix: fix typo

e8e185a

Merge branch 'feat/question-types' of https://github.com/open-science…

92f33d2

…lab/GraphGen into feat/question-types

ChenZiHong-Gavin merged commit 9c6f678 into main Jan 15, 2026
6 checks passed

ChenZiHong-Gavin deleted the feat/question-types branch January 15, 2026 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add more qa generation types #159

feat: add more qa generation types #159

ChenZiHong-Gavin commented Jan 15, 2026 •

edited

Loading

Uh oh!

ChenZiHong-Gavin commented Jan 15, 2026

Uh oh!

ChenZiHong-Gavin commented Jan 15, 2026

Uh oh!

gemini-code-assist bot commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jan 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,3 @@
		# Generate Multi-Answer QAs

		Multi-answer question answering (QA) involves generating questions that can have multiple valid answers. This is particularly useful in educational settings, surveys, and research where diverse perspectives are valuable. No newline at end of file

	if m := re.match(r"^([A-Z])[.\s]\s(.)$", line):
	if m := re.match(r"^([A-D])[.\s]\s(.)$", line):

feat: add more qa generation types #159

feat: add more qa generation types #159

Conversation

ChenZiHong-Gavin commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChenZiHong-Gavin commented Jan 15, 2026

Uh oh!

ChenZiHong-Gavin commented Jan 15, 2026

Uh oh!

gemini-code-assist bot commented Jan 15, 2026

Summary of Changes

Highlights

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChenZiHong-Gavin commented Jan 15, 2026 •

edited

Loading