1. 程式人生 > >Getting data from X-Sources to Google’s Colab

Getting data from X-Sources to Google’s Colab

Getting data from X-Source to Google’s Colab

We will explore various options of reading data to a Colab notebook

What is Google Colab ?

Google Colab is a free cloud based programming environment with the concept of notebooks like Jupyter. Recently, it has gain much popularity among developers (majorly data enthusiasts)

by providing free GPU (Graphic processing Unit) and TPU (Tensor Processing Unit) service and reducing their computation time by order of 10 at minimum.

Firing a Colab Notebook

In today’s blog, we will see how we can access and load your personal data to Colab for some interesting research projects.

Let’s fill the X in the heading. We will see to how one can load data from Google Drive

, Local System, Google Sheets, S3, Dropbox

You can find all the code in this Notebook

File System

# colab provides `files` helper for uploading data from local file system to google colab
from google.colab import files
uploaded = files.upload()
all_data = ''
# `uploaded` is dict that holds file names as keys and values as the content of that filefor data_file in uploaded.keys():  print 'Reading file {}'.format(data_file)  all_data += uploaded.get(data_file)  all_data += '\n'  print 'Total length read so far is {}'.format(len(data))

Google Drive

# colab provides `drive` helper for uploading data from google drive to google colab
from google.colab import drive
# mounting drive# this will require authentication : Follow the steps as guideddrive.mount('/content/drive')
data_files = glob.glob("/content/drive/My Drive/Colab Notebooks/*.txt")
all_data = ''for data_file in data_files:  print 'Reading file {}'.format(data_file)  all_data += open(data_file, 'r').read()  print 'Total length read so far is {}'.format(len(all_data))  all_data += '\n'

S3

import boto3import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket nameKEY = 'image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:    # we are trying to download a JPEG image from s3 with name `image_in_s3` to colab dir with name `image_in_colab`    s3.Bucket(BUCKET_NAME).download_file(KEY, 'image_in_colab.jpg')except botocore.exceptions.ClientError as e:    if e.response['Error']['Code'] == "404":        print("The object does not exist.")    else:        raise

Read Handling large files in colabto read on some other methods for uploading and downloading your data files to Colab.

Why use Google Colab ?

It never hurts to use free stuff that has so much goodness packaged to it. Also it makes it breeze to code pair and review at same place. Last but not the least, the processing power (free K80 GPU) it provides is the one to die for.