Home

Déjà entendu parlé de Hadoop, Spark, Flink ou HDFS ? Curieux de savoir enfin de quoi il retourne ?

This workshop takes place at UNIFR / HEIA-FR the 19th and 20th of October 2017. It is organised by IT-Valley in collaboration with the DAPLAB. It is intended for any developer willing to familiarize with the Big Data and Data Sciences technologies.

Location:

  • Room C00.11
  • Haute école d'ingénierie et d'architecture de Fribourg
  • Bd de Pérolles 80, 1705 Fribourg

It will be given in French, with support in English.

Schedule

Day 1:

  • Introduction: Big Data, Hadoop, HDFS, what are they ?
  • HDFS: filesystem and command line usage
  • MapReduce:
    • theory: what is MapReduce ?
    • practice: My first MapReduce application (java)
  • Hive:
    • theory: what is Hive?
    • practice: Querying and manipulating data using Hive

Day 2:

  • Spark and Zeppelin:
    • getting started with Zeppelin (python + pyspark)
    • Spark SQL: quick data discovery and visualisation (python + pyspark)
  • LSA, Latent Semantic Analysis:
    • theory: what is it and what is it for ?
    • practice: implementing a document search engine using LDA, Latent Dirichlet Allocation (python + pyspark)
  • if there is some time left, a little tour of the DAPLAB

Slides

Requirements

For the workshop, you will need the following:

  • a laptop (Recommended: Mac or Linux)
  • the Java JDK version 8 or above
  • a java IDE:
  • Maven:
    • If you have IntelliJ, you can skip this step as it already ships with an embedded Maven
    • If you want Maven available from the command line as well, follow the installation instructions on the Maven website
  • Docker:
    • Windows installation guide
    • Mac installation guide
  • Snorkel:

Setting up Snorkel

You need Docker running on your machine.

Snorkel is a docker container allowing you to run Zeppelin locally.

  1. Download snorkel:

    • If you have git installed, clone the following repository:
      git clone https://github.com/Sqooba/snorkel.git
      
    • If you don't have git installed, go to https://github.com/Sqooba/snorkel and select Clone or download > Download ZIP, then unzip.
  2. open a command line and navigate inside the snorkel folder;

  3. build the zeppelin image:

    • On Mac/Linux:
      ./build-images.sh
      
    • On Windows:
      ./build-images.cmd
      
  4. start zeppelin:

    • On Mac/Linux:
      ./zeppelin.sh --start
      
    • On Windows (from the command prompt or the powershell):
      ./start-zeppelin.cmd
      
  5. check that zeppelin is running:

  6. stop zeppelin:

    • On Mac/Linux:
      ./zeppelin.sh --stop
      
    • On Windows (from the command prompt or the powershell):
      ./stop-zeppelin.cmd