GPT4Rec 리뷰

Intro

1. NLP 기반 모델들이 user-item 상호작용 sequences를 모델링하여 개인화된 추천 시스템에 사용됨

2. 한계 : NLP기반 모델들은 item을 단순히 ID로만 대함 (semantic 정보사용 x), discriminative modelling을 사용한다는 점에서 한계가 있음

* discriminative / generative model(https://ratsgo.github.io/generative%20model/2017/12/17/compare/)

- incapable of fully leveraging the content info of items and the language modeling ability of NLP models

- unable to accomodate changing and growing item inventories -> 좋은 추천을 위한 user interest를 해석하는데 어려움

3. 해결: GPT4Rec

Query Generation and Searching

- discriminative model -> generative model 사용하여 item 이름과 생성 프롬프트를 결합시켜 search queries를 생성함

- search queries를 search engine에 넣어 ranking된 추천 결과를 받음

- item embedding vectors -> item titles / User embedding vectors -> Generated queries

- 장점 :

- utillize semantic information in item titlles 그래서 capture user's diverse interests

- adopt multi-query beam search technique in query generation -> decode user's multi-interest과 improve recommendation diversity

- 이런 쿼리들은 human-understandable and hold standalone value 하기 때문에 user interests를 해석 가능

- 쿼리 검색 기반 추천은 cold-start 문제도 해결할 수 있고, 계속 바뀌는 item inventory 문제도 해결 가능

2. Methodology

2.1 Query Generation with the Language Model

The goal is

- to learn the user representation in the language space from the item interaction sequence

- generates multiple queries that represent user interests.

=>Learning User/Item Representations in Language Space : item title을 시퀀스 인풋으로 사용하여 query를 생성함으로써 user의 intereset를 'decode'함

Prompt :

Previously, the customer has bought:
<ITEM TITLE 1>. <ITEM TITLE 2>...
In the future, the customer wants to buy

2.2 Item Retrieval with the Search Engine

discriminator 역할을 함

beam search : BM25를 사용하여 검색

BM25는 word frequency, inverse-document frequency and document length로 점수를 매김

각 쿼리 마다 얻은 아이템들을 최종 추천 셋으로 정함

Given beam size 𝑚, generation score function 𝑆 (·), and

the candidate queries 𝑄(𝑙) =(𝑞𝑙,𝑞𝑙,...𝑞𝑙 ) with length 𝑙, the beam search algorithm updates 𝑄(𝑙+1) to be the top-𝑚 queries of length 𝑙 + 1 that maximize S(𝑊𝑢,𝑞), 𝑞 ∈ {|𝑞| = 𝑙 + 1,𝑞

각 생성된 쿼리를 입력으로 받아 해당하는 가장 관련 있는 항목을 매칭 스코어 함수를 사용하여 출력함.

각 쿼리의 검색 결과를 조합하기 위한 순위 기반 전략을 사용. 먼저 생성 스코어가 가장 높은 쿼리에서 상위 𝐾/𝑚개의 항목을 검색 결과에서 가져온 후, 나머지 쿼리에서 스코어 랭킹에 따라 순차적으로 상위 𝐾/𝑚개의 중복되지 않는 항목을 추가함.

2.3. Training Strategy

1. Finetuning pre-trained GPT2 model

𝑖 (1),𝑖(2) , ...,𝑖(T-1) of each user 𝑢, we take the first 𝑇 − 1 item titles in the prompt shown then concatenate it with the title of last item 𝑖 (정답) to form training corpus to fine-tune the pre-trained GPT-2 model.

가정 : The most fine- grained and accurate search query is the target item title itself.

2. Gride search BM25 hyper-parameters given generated queries

3. 실험

Recall은 사용자가 실제로 관심있는 아이템 중에 모델이 추천한 아이템의 비율을 의미

diversity : 다양한 유형의 아이템을 추천하는가?

Coverage : 많은 아이템을 추천할 수 있는가?

query가 많을수록 더 성능이 좋음

'AI' 카테고리의 다른 글

Rethinking Personalized Ranking at Pinterest: An End-to-EndApproach 리뷰 (0)	2024.01.23
Of Spiky SVDs and Music Recommendation 리뷰 (0)	2024.01.16
대략 01/12의 공부 일지 + 10주차 회고 (0)	2024.01.12
Feature Enigneering 공부 (0)	2024.01.12
GLORY 리뷰 (0)	2024.01.09