Skip to content

HKUST-KnowComp/RelationalIntentionGraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Session Graph Understanding

Code and data for the intention graph construction pipeline and the recommendation experiments from the paper "Intention Knowledge Graph Construction for User Intention Relation Modeling" (arXiv:2412.11500), accepted by EACL 2026.

Citation

If you use this code or data, please cite:

@misc{bai2025intentionknowledgegraphconstruction,
      title={Intention Knowledge Graph Construction for User Intention Relation Modeling},
      author={Jiaxin Bai and Zhaobo Wang and Junfei Cheng and Dan Yu and Zerui Huang and Weiqi Wang and Xin Liu and Chen Luo and Yanming Zhu and Bo Li and Yangqiu Song},
      year={2025},
      eprint={2412.11500},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.11500},
}

Repository layout (high level)

  • rec_model/: session-based recommendation experiments.
  • generation_results/, annotation/, data_preprocess/: intention generation and annotation utilities.
  • prompting.py, gpt35_prompting.py, gpt4_prompting.py: LLM prompting code.
  • answer_process.py: post-processing to build intention graph outputs.

Graph construction (prompting)

The intention graph is built by prompting LLMs to generate intentions and then post-processing the outputs into structured triples.

  1. Prompting for intentions:
  • prompting.py is a minimal runner using Azure OpenAI (see openai.api_base and openai.api_key).
  • Example invocation in prompting.py writes JSONL to generation_results/ (one JSON object per line with the session, prompt, and LLM answer).
  • gpt35_prompting.py / gpt4_prompting.py are alternative entry points.
  1. Parsing and graph prep:
  • answer_process.py parses the raw answers into a triple file and can create intermediate artifacts (e.g., result_triple.txt and pickles for sessions).

VERA model runs (relation evaluation)

The VERA-based relation scoring scripts live in discourse_model/.

Example (single split):

cd discourse_model
CUDA_VISIBLE_DEVICES=0 python vera_evaluation.py -s 0

Notes:

  • vera_evaluation.py loads the VERA encoder from liujch1998/vera-base and expects a fine-tuned checkpoint at /data/jbai/cjf/Vera/vera_best_model.pth. Update that path if needed.
  • The input intention list is read from data_preprocess/generation_results/gpt-35-turbo_answer_<split>_intentions.json.
  • Convenience scripts vera_sampling_*.sh run different splits.

Recommendation experiments (rec_model/)

Data

  • The default dataset is rec_model/data/m2.txt.
  • Full raw data files are available here: Google Drive folder.
  • Format: one session per line as user_id item_id_1 item_id_2 ... item_id_n (space-separated integers).
  • rec_model/mat_m2_seqf.npz and rec_model/mat_m2.npz are sparse matrices used by the model code. Keep them in rec_model/ when running experiments.

Run

From the repo root:

cd rec_model
python main.py --data_dir ./data/ --data_name m2 --gpu_id 0

Key arguments in rec_model/main.py:

  • --max_seq_length: maximum session length (default: 20).
  • --hidden_size, --num_hidden_layers, --num_attention_heads: SASRec size.
  • --batch_size, --epochs, --lr: training hyperparameters.
  • --no_cuda: force CPU if needed.

Outputs

  • Logs are written to rec_model/logs_m2.txt.
  • Validation/test metrics are printed every epoch.

Notes

  • Splits are random with ratio 0.8/0.1/0.1 by default in rec_model/utils.py:get_user_seqs_split.
  • Training sessions are augmented by prefixing sequences of length >= 2.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published