What is the Sandbox1 English (en) français (fr)

From ESSnet Big Data
Jump to: navigation, search

The Sandbox is a distributed computing environment hosted at the Irish Center for High-End Computing (ICHEC). It is a computing cluster made up by 6 nodes, connected with each other in order to constitute a unique environment. 

The core software is an installation of Hadoop (Hortonworks Data Platform), the industry-standard environment for processing of Big Data. Hadoop is based on a distributed file system, where you can load you data sets, and analyze it using various tools at disposal. 

Besides Hadoop, the software installed in the Sandbox includes R (with a specific package for handling big data sets through Hadoop) and Elasticsearch, a indexing engine for storing and retrieving unstructured data.

RTENOTITLE


The Sandbox environment can be accessed in two different ways: 
1. Web Interfaces

HUE: allows for the uploading and downloading of data as well as running scripts in Pig and Hive and running processing jobs. The web interface also includes dedicated functions for browsing the distributed files systems and monitoring the state of the jobs. This interface will address the needs of most users. Go

RStudio: browser-based development environment for R.  Go

Authorised users can access the web interfaces via a web browser by connecting to the given URLs and typing in the credentials received from ICHEC.



2. Command line interface: accessible via SSH connection. Although you will be able to operate mostly with the web interface you might need to operate via the command line for more in-depth control. The command line is also the access point to R and Spark. 


What is SSH? Secure Shell (SSH) is a program to log into another computer over a network, to execute commands in a remote machine, and to move files from one machine to another. It provides strong authentication and secure communications over insecure channels. SSH provides secure X connections and secure forwarding of arbitrary TCP connections. In order to access the sandbox using ssh, users will need to generate a ssh key and send this to support for installation on the sandbox. SSH is installed as standard on Linux and Mac OS X and a client can be freely downloaded for Windows at Putty.

SSH access requires the generation of a SSH public key. This will allow password-less access to the Sandbox server. 

The following tutorial illustrate how to generate and use SSH keys in Windows: http://www.howtoforge.com/ssh_key_based_logins_putty

Once you have generated the public key (Step 5 of the tutorial), this must be sent to ICHEC for inclusion in the server's list of authorized keys. This is the text in the box named "Public key for pasting into OpenSSH authorized_keys file" (see image under "Then copy the public key from the PuTTYgen window:"). 

When you receive the acknowledgement from ICHEC, you will be able to connect to the Sandbox command line.

In Linux/Mac issue the following command on your command line in: 

ssh <username>@unece-sandbox.ichec.ie -i <path to the private key>

In Windows follow the tutorial from step 7.

You will be asked for the passphrase you chosen when generating the public key (please note: this is NOT the password for the web interface!).