Skip to content

Conversation

@ChenZiHong-Gavin
Copy link
Collaborator

@ChenZiHong-Gavin ChenZiHong-Gavin commented Jan 15, 2026

This PR enhances the question-answering generation capabilities by introducing support for fill-in-blank, multi-answer, and multi-choice question types.

@github-actions github-actions bot added documentation Improvements or additions to documentation core examples labels Jan 15, 2026
@ChenZiHong-Gavin ChenZiHong-Gavin changed the title feat: add multi choice qa generation feat: add more qa generation types Jan 15, 2026
@ChenZiHong-Gavin ChenZiHong-Gavin marked this pull request as ready for review January 15, 2026 10:39
@ChenZiHong-Gavin
Copy link
Collaborator Author

/gemini summary

@ChenZiHong-Gavin
Copy link
Collaborator Author

/gemini review

@gemini-code-assist
Copy link
Contributor

Summary of Changes

This pull request significantly enhances the question-answering generation capabilities by introducing support for fill-in-blank, multi-answer, and multi-choice question types. This expansion provides greater flexibility for users to create varied educational and assessment content. The changes include new generator implementations, updated output formatting logic to accommodate question options, and comprehensive example configurations to demonstrate the new functionalities.

Highlights

  • New QA Generation Types: Introduced three new question-answering generation methods: fill-in-blank, multi-answer, and multi-choice, significantly expanding the tool's capabilities for diverse content creation.
  • Example Configurations and Scripts: Added dedicated example configurations and shell scripts for each new QA type, making it easy for users to get started and understand their usage.
  • Enhanced Output Formatting: The BaseGenerator's format_generation_results method was updated to correctly incorporate options into the question text for Alpaca, Sharegpt, and ChatML output formats, ensuring comprehensive question presentation.
  • Dynamic Generator Loading: The GenerateService now dynamically imports and instantiates the appropriate generator class based on the specified method, improving modularity and extensibility.
  • New Prompt Templates: New prompt templates (in both English and Chinese) have been added for fill-in-blank, multi-answer, and multi-choice question generation, guiding the LLM to produce structured and accurate QA pairs.
Changelog
  • examples/generate/generate_fill_in_blank_qa/README.md
    • Added README for fill-in-blank QA generation.
  • examples/generate/generate_fill_in_blank_qa/fill_in_blank_config.yaml
    • Added configuration file for fill-in-blank QA generation.
  • examples/generate/generate_fill_in_blank_qa/generate_fill_in_blank.sh
    • Added shell script to run fill-in-blank QA generation.
  • examples/generate/generate_multi_answer_qa/README.md
    • Added README for multi-answer QA generation.
  • examples/generate/generate_multi_answer_qa/generate_multi_answer.sh
    • Added shell script to run multi-answer QA generation.
  • examples/generate/generate_multi_answer_qa/multi_answer_config.yaml
    • Added configuration file for multi-answer QA generation.
  • examples/generate/generate_multi_choice_qa/README.md
    • Added README for multi-choice QA generation.
  • examples/generate/generate_multi_choice_qa/generate_multi_choice.sh
    • Added shell script to run multi-choice QA generation.
  • examples/generate/generate_multi_choice_qa/multi_choice_config.yaml
    • Added configuration file for multi-choice QA generation.
  • graphgen/bases/base_generator.py
    • Modified format_generation_results to include options in the question text for Alpaca, Sharegpt, and ChatML formats when options are present in the QA data.
  • graphgen/models/init.py
    • Imported FillInBlankGenerator, MultiAnswerGenerator, and MultiChoiceGenerator.
  • graphgen/models/generator/init.py
    • Imported FillInBlankGenerator, MultiAnswerGenerator, and MultiChoiceGenerator.
  • graphgen/models/generator/fill_in_blank_generator.py
    • Added new FillInBlankGenerator class for generating fill-in-blank questions.
    • Implemented parse_response method to extract fill-in-blank QA pairs from LLM output.
    • Implemented build_prompt method to construct prompts for fill-in-blank generation.
  • graphgen/models/generator/multi_answer_generator.py
    • Added new MultiAnswerGenerator class for generating multi-answer questions.
    • Implemented parse_response method to extract multi-answer QA pairs and options from LLM output.
    • Implemented build_prompt method to construct prompts for multi-answer generation.
  • graphgen/models/generator/multi_choice_generator.py
    • Added new MultiChoiceGenerator class for generating multi-choice questions.
    • Implemented parse_response method to extract multi-choice QA pairs and options from LLM output.
    • Implemented build_prompt method to construct prompts for multi-choice generation.
  • graphgen/operators/generate/generate_service.py
    • Updated __init__ method to accept generate_kwargs for passing parameters to specific generators.
    • Modified generator instantiation logic to dynamically import and select the correct generator based on the method parameter (e.g., 'multi_choice', 'multi_answer', 'fill_in_blank').
    • Removed direct imports of individual generator classes, opting for dynamic imports.
  • graphgen/templates/init.py
    • Imported new prompt templates: FILL_IN_BLANK_GENERATION_PROMPT, MAQ_GENERATION_PROMPT, MCQ_GENERATION_PROMPT.
  • graphgen/templates/generation/init.py
    • Imported new prompt templates for fill-in-blank, multi-answer, and multi-choice generation.
  • graphgen/templates/generation/fill_in_blank_generation.py
    • Added English and Chinese prompt templates for fill-in-blank question generation.
  • graphgen/templates/generation/multi_answer_generation.py
    • Added English and Chinese prompt templates for multi-answer question generation.
  • graphgen/templates/generation/multi_choice_generation.py
    • Added English and Chinese prompt templates for multi-choice question generation.
  • graphgen/templates/question_generation.py
    • Removed deprecated question generation template file.
Activity
  • ChenZiHong-Gavin requested a summary of the pull request.
  • ChenZiHong-Gavin requested a review of the pull request.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces three new question-answering generation types: fill_in_blank, multi_answer, and multi_choice. The changes include new generator classes, corresponding prompt templates, and example configurations to demonstrate their usage. The core logic in GenerateService is updated to support these new methods, and BaseGenerator is refactored to handle formatting for questions with options.

The implementation is solid, but I've found a critical bug in the result formatting logic and several areas for improvement regarding code clarity, consistency, and maintainability. My comments include fixes for incorrect docstrings, misleading log messages, and inconsistencies in prompt templates and regular expressions. I've also suggested a refactoring for the generator factory in GenerateService to make it more scalable.

@@ -0,0 +1,3 @@
# Generate Multi-Answer QAs

Multi-answer question answering (QA) involves generating questions that can have multiple valid answers. This is particularly useful in educational settings, surveys, and research where diverse perspectives are valuable. No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file is missing a final newline character. It's a common convention to end files with a newline to ensure consistency and prevent issues with some tools.

if not line:
continue
# Match patterns like "A. text" or "B. text"
if m := re.match(r"^([A-Z])[.\s]\s*(.*)$", line):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex r"^([A-Z])[.\s]\s*(.*)$" is used to parse options. However, the prompt specifies that there should be four options, which are typically labeled A, B, C, and D. Using [A-Z] is too broad and might accept invalid option letters. For consistency with MultiChoiceGenerator and to be more strict, consider changing it to r"^([A-D])[.\s]\s*(.*)$".

Suggested change
if m := re.match(r"^([A-Z])[.\s]\s*(.*)$", line):
if m := re.match(r"^([A-D])[.\s]\s*(.*)$", line):

Comment on lines 26 to 68
if self.method == "atomic":
from graphgen.models import AtomicGenerator

self.generator = AtomicGenerator(self.llm_client)
elif self.method == "aggregated":
from graphgen.models import AggregatedGenerator

self.generator = AggregatedGenerator(self.llm_client)
elif self.method == "multi_hop":
from graphgen.models import MultiHopGenerator

self.generator = MultiHopGenerator(self.llm_client)
elif self.method == "cot":
from graphgen.models import CoTGenerator

self.generator = CoTGenerator(self.llm_client)
elif self.method in ["vqa"]:
elif self.method == "vqa":
from graphgen.models import VQAGenerator

self.generator = VQAGenerator(self.llm_client)
elif self.method == "multi_choice":
from graphgen.models import MultiChoiceGenerator

self.generator = MultiChoiceGenerator(
self.llm_client,
num_of_questions=generate_kwargs.get("num_of_questions", 5),
)
elif self.method == "multi_answer":
from graphgen.models import MultiAnswerGenerator

self.generator = MultiAnswerGenerator(
self.llm_client,
num_of_questions=generate_kwargs.get("num_of_questions", 3),
)
elif self.method == "fill_in_blank":
from graphgen.models import FillInBlankGenerator

self.generator = FillInBlankGenerator(
self.llm_client,
num_of_questions=generate_kwargs.get("num_of_questions", 5),
)
else:
raise ValueError(f"Unsupported generation mode: {method}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This long if/elif chain for selecting the generator can be refactored to improve maintainability. Using a dictionary to map method names to generator classes and their arguments would make the code more scalable and easier to read, especially as more generators are added.

For example, you could define a map and use importlib to dynamically load the required class:

import importlib

GENERATOR_MAP = {
    "atomic": ("graphgen.models", "AtomicGenerator", {}),
    "multi_choice": (
        "graphgen.models",
        "MultiChoiceGenerator",
        {"num_of_questions": 5}, # default value
    ),
    # ... other generators
}

# in __init__
if self.method not in GENERATOR_MAP:
    raise ValueError(f"Unsupported generation mode: {self.method}")

module_name, class_name, default_kwargs = GENERATOR_MAP[self.method]
# ... logic to create instance

ChenZiHong-Gavin and others added 5 commits January 15, 2026 18:46
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
ChenZiHong-Gavin and others added 3 commits January 15, 2026 18:50
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ChenZiHong-Gavin ChenZiHong-Gavin merged commit 9c6f678 into main Jan 15, 2026
6 checks passed
@ChenZiHong-Gavin ChenZiHong-Gavin deleted the feat/question-types branch January 15, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core documentation Improvements or additions to documentation examples

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants