openrec.legacy.utils.dataset module¶
-
class
openrec.legacy.utils.dataset.Dataset(raw_data, max_user, max_item, name='dataset')¶ Bases:
objectThe Dataset class stores a sequence of data points for training or evaluation.
Parameters: - raw_data (numpy structured array) – Input raw data.
- max_user (int) – Maximum number of users in the recommendation system.
- max_item (int) – Maximum number of items in the recommendation system.
- name (str) – Name of the dataset.
Notes
The Dataset class expects
raw_dataas a numpy structured array, where each row represents a data point and contains at least two keys:user_id: the user involved in the interaction.item_id: the item involved in the interaction.
raw_datamight contain other keys, such astimestamp, andlocation, etc. based on the use cases of different recommendation systems. An user should be uniquely and numerically indexed from 0 tototal_number_of_users - 1. The items should be indexed likewise.-
max_item()¶ Maximum number of items.
Returns: Maximum number of items. Return type: int
-
max_user()¶ Maximum number of users.
Returns: Maximum number of users. Return type: int
-
shuffle()¶ Shuffle the dataset entries.