-
Notifications
You must be signed in to change notification settings - Fork 65
feat: add more qa generation types #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/gemini summary |
|
/gemini review |
Summary of ChangesThis pull request significantly enhances the question-answering generation capabilities by introducing support for fill-in-blank, multi-answer, and multi-choice question types. This expansion provides greater flexibility for users to create varied educational and assessment content. The changes include new generator implementations, updated output formatting logic to accommodate question options, and comprehensive example configurations to demonstrate the new functionalities. Highlights
Changelog
Activity
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces three new question-answering generation types: fill_in_blank, multi_answer, and multi_choice. The changes include new generator classes, corresponding prompt templates, and example configurations to demonstrate their usage. The core logic in GenerateService is updated to support these new methods, and BaseGenerator is refactored to handle formatting for questions with options.
The implementation is solid, but I've found a critical bug in the result formatting logic and several areas for improvement regarding code clarity, consistency, and maintainability. My comments include fixes for incorrect docstrings, misleading log messages, and inconsistencies in prompt templates and regular expressions. I've also suggested a refactoring for the generator factory in GenerateService to make it more scalable.
| @@ -0,0 +1,3 @@ | |||
| # Generate Multi-Answer QAs | |||
|
|
|||
| Multi-answer question answering (QA) involves generating questions that can have multiple valid answers. This is particularly useful in educational settings, surveys, and research where diverse perspectives are valuable. No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if not line: | ||
| continue | ||
| # Match patterns like "A. text" or "B. text" | ||
| if m := re.match(r"^([A-Z])[.\s]\s*(.*)$", line): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex r"^([A-Z])[.\s]\s*(.*)$" is used to parse options. However, the prompt specifies that there should be four options, which are typically labeled A, B, C, and D. Using [A-Z] is too broad and might accept invalid option letters. For consistency with MultiChoiceGenerator and to be more strict, consider changing it to r"^([A-D])[.\s]\s*(.*)$".
| if m := re.match(r"^([A-Z])[.\s]\s*(.*)$", line): | |
| if m := re.match(r"^([A-D])[.\s]\s*(.*)$", line): |
| if self.method == "atomic": | ||
| from graphgen.models import AtomicGenerator | ||
|
|
||
| self.generator = AtomicGenerator(self.llm_client) | ||
| elif self.method == "aggregated": | ||
| from graphgen.models import AggregatedGenerator | ||
|
|
||
| self.generator = AggregatedGenerator(self.llm_client) | ||
| elif self.method == "multi_hop": | ||
| from graphgen.models import MultiHopGenerator | ||
|
|
||
| self.generator = MultiHopGenerator(self.llm_client) | ||
| elif self.method == "cot": | ||
| from graphgen.models import CoTGenerator | ||
|
|
||
| self.generator = CoTGenerator(self.llm_client) | ||
| elif self.method in ["vqa"]: | ||
| elif self.method == "vqa": | ||
| from graphgen.models import VQAGenerator | ||
|
|
||
| self.generator = VQAGenerator(self.llm_client) | ||
| elif self.method == "multi_choice": | ||
| from graphgen.models import MultiChoiceGenerator | ||
|
|
||
| self.generator = MultiChoiceGenerator( | ||
| self.llm_client, | ||
| num_of_questions=generate_kwargs.get("num_of_questions", 5), | ||
| ) | ||
| elif self.method == "multi_answer": | ||
| from graphgen.models import MultiAnswerGenerator | ||
|
|
||
| self.generator = MultiAnswerGenerator( | ||
| self.llm_client, | ||
| num_of_questions=generate_kwargs.get("num_of_questions", 3), | ||
| ) | ||
| elif self.method == "fill_in_blank": | ||
| from graphgen.models import FillInBlankGenerator | ||
|
|
||
| self.generator = FillInBlankGenerator( | ||
| self.llm_client, | ||
| num_of_questions=generate_kwargs.get("num_of_questions", 5), | ||
| ) | ||
| else: | ||
| raise ValueError(f"Unsupported generation mode: {method}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This long if/elif chain for selecting the generator can be refactored to improve maintainability. Using a dictionary to map method names to generator classes and their arguments would make the code more scalable and easier to read, especially as more generators are added.
For example, you could define a map and use importlib to dynamically load the required class:
import importlib
GENERATOR_MAP = {
"atomic": ("graphgen.models", "AtomicGenerator", {}),
"multi_choice": (
"graphgen.models",
"MultiChoiceGenerator",
{"num_of_questions": 5}, # default value
),
# ... other generators
}
# in __init__
if self.method not in GENERATOR_MAP:
raise ValueError(f"Unsupported generation mode: {self.method}")
module_name, class_name, default_kwargs = GENERATOR_MAP[self.method]
# ... logic to create instanceCo-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…lab/GraphGen into feat/question-types
This PR enhances the question-answering generation capabilities by introducing support for fill-in-blank, multi-answer, and multi-choice question types.