openrec.legacy.utils.dataset module

class openrec.legacy.utils.dataset.Dataset(raw_data, max_user, max_item, name='dataset')

Bases: object

The Dataset class stores a sequence of data points for training or evaluation.

  • raw_data (numpy structured array) – Input raw data.
  • max_user (int) – Maximum number of users in the recommendation system.
  • max_item (int) – Maximum number of items in the recommendation system.
  • name (str) – Name of the dataset.


The Dataset class expects raw_data as a numpy structured array, where each row represents a data point and contains at least two keys:

  • user_id: the user involved in the interaction.
  • item_id: the item involved in the interaction.

raw_data might contain other keys, such as timestamp, and location, etc. based on the use cases of different recommendation systems. An user should be uniquely and numerically indexed from 0 to total_number_of_users - 1. The items should be indexed likewise.


Maximum number of items.

Returns:Maximum number of items.
Return type:int

Maximum number of users.

Returns:Maximum number of users.
Return type:int

Shuffle the dataset entries.