Home
Déjà entendu parlé de Hadoop, Spark, Flink ou HDFS ? Curieux de savoir enfin de quoi il retourne ?
This workshop takes place at UNIFR / HEIA-FR the 19th and 20th of October 2017. It is organised by IT-Valley in collaboration with the DAPLAB. It is intended for any developer willing to familiarize with the Big Data and Data Sciences technologies.
Location:
- Room C00.11
- Haute école d'ingénierie et d'architecture de Fribourg
- Bd de Pérolles 80, 1705 Fribourg
It will be given in French, with support in English.
Schedule
Day 1:
- Introduction: Big Data, Hadoop, HDFS, what are they ?
- HDFS: filesystem and command line usage
- MapReduce:
- theory: what is MapReduce ?
- practice: My first MapReduce application (java)
- Hive:
- theory: what is Hive?
- practice: Querying and manipulating data using Hive
Day 2:
- Spark and Zeppelin:
- getting started with Zeppelin (python + pyspark)
- Spark SQL: quick data discovery and visualisation (python + pyspark)
- LSA, Latent Semantic Analysis:
- theory: what is it and what is it for ?
- practice: implementing a document search engine using LDA, Latent Dirichlet Allocation (python + pyspark)
- if there is some time left, a little tour of the DAPLAB
Slides
Requirements
For the workshop, you will need the following:
- a laptop (Recommended: Mac or Linux)
- the Java JDK version 8 or above
- a java IDE:
- if you don't already have one, please install the IntelliJ Community Edition
- Maven:
- If you have IntelliJ, you can skip this step as it already ships with an embedded Maven
- If you want Maven available from the command line as well, follow the installation instructions on the Maven website
- Docker:
- Snorkel:
-
follow the instruction at https://github.com/Sqooba/snorkel or have a look at the section below.
For Windows, I added some scripts at https://github.com/derlin/snorkel, the pull request is still in review. Use mine for now.
-
Setting up Snorkel
You need Docker running on your machine.
Snorkel is a docker container allowing you to run Zeppelin locally.
-
Download snorkel:
- If you have git installed, clone the following repository:
git clone https://github.com/Sqooba/snorkel.git
- If you don't have git installed, go to https://github.com/Sqooba/snorkel and select Clone or download > Download ZIP, then unzip.
- If you have git installed, clone the following repository:
-
open a command line and navigate inside the snorkel folder;
-
build the zeppelin image:
- On Mac/Linux:
./build-images.sh
- On Windows:
./build-images.cmd
- On Mac/Linux:
-
start zeppelin:
- On Mac/Linux:
./zeppelin.sh --start
- On Windows (from the command prompt or the powershell):
./start-zeppelin.cmd
- On Mac/Linux:
-
check that zeppelin is running:
- go to http://localhost:8080/, you should have a zeppelin welcome screen;
-
stop zeppelin:
- On Mac/Linux:
./zeppelin.sh --stop
- On Windows (from the command prompt or the powershell):
./stop-zeppelin.cmd
- On Mac/Linux: