Skip to content

Data Scientist Platform Release Notes

10 August 2018

  • Realtime user play history
    • Main use case : filter out recently watched content from recommendation
    • Last 24 hours of media consumption by user stored in redis in realtime including anonymous users
    • Enabled only for broadcasters running PEACH in production at the moment
    • History can be retrieved in the endpoints in the following way:
      from pipe_algorithms_lib.history_utils import realtime_history
      # list of media ids ordered by time from old to new ones using pipe_c cookie
      history = realtime_history(codops, cookie_id)
      

8 March 2018

  • List HDFS files from the Notebooks

    from pipe_algorithms_lib.hadoop import ls
    
    # Basic
    ls(path="/")
    
    # With a defined path
    ls(path="/recsys/chrts/realtime")
    
    # List with all details
    ls(path="/recsys/chrts/realtime", all=True)
    
  • Custom Python environment for Notebooks & Tasks

    • This allows data scientist to use third-party libraries in their code seamlessly on the platform.
    • Custom Python environment for endpoints was already provided some weeks ago.
    • Data scientist can also leverage this to create their own libraries to factor common code, thus improving maintenance, reusability and sharing across broadcasters and teams.
    • Common libraries can also be versioned in Git (using EBU GitLab or public Github repositories)
    • How does it work?