Amazon Neptune – A Fully Managed Graph Database Service

阿新 • • 發佈：2019-01-14

Of all the data structures and algorithms we use to enable our modern lives, graphs are changing the world everyday. Businesses continuously create and ingest rich data with complex relationships. Yet developers are still forced to model these complex relationships in traditional databases. This leads to frustratingly complex queries with high costs and increasingly poor performance as you add relationships. We want to make it easy for you to deal with these modern and increasingly complex datasets, relationships, and patterns.

Hello, Amazon Neptune

Today we’re launching a limited preview (sign up here) of Amazon Neptune, a fast and reliable graph database service that makes it easy to gain insights from relationships among your highly connected datasets. The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds of latency. Delivered as a fully managed database, Amazon Neptune frees customers to focus on their applications rather than tedious undifferentiated operations like maintenance, patching, backups, and restores. The service supports fast-failover, point-in-time recovery, and Multi-AZ deployments for high availability. With support for up to 15 read replicas you can scale query throughput to 100s of thousands of queries per second. Amazon Neptune runs within your

Amazon Virtual Private Cloud and allows you to encrypt your data at rest, giving you complete control over your data integrity in transit and at rest.

There are a lot of interesting features in this service but graph databases may be an unfamiliar topic for many of you so lets make sure we’re using the same vocabulary.

Graph Databases

A graph database is a store of vertices (nodes) and edges (relationships or connections) which can both have properties stored as key-value pairs. Graph databases are useful for connected, contextual, relationship-driven data. Some examples applications are social media networks, recommendation engines, driving directions, logistics, diagnostics, fraud detection, and genomic sequencing.

Amazon Neptune supports two open standards for describing and querying your graphs:

Apache TinkerPop3 style Property Graphs queried with Gremlin. Gremlin is a graph traversal language where a query is a traversal made up of discrete steps following an edge to a node. Your existing tools and clients that are designed to work with TinkerPop allow you to quickly get started with Neptune.
Resource Description Framework (RDF) queried with SPARQL. SPARQL is a declarative language based on Semantic Web standards from W3C. It follows a subject->predicate->object model. Specifically Neptune supports the following standards: RDF 1.1., SPARQL Query 1.1., SPARQL Update 1.1, and the SPARQL Protocol 1.1.

If you have existing applications that work with SPARQL or TinkerPop you should be able to start using Neptune by simply updating the endpoint your applications connect to.

Let’s walk through launching Amazon Neptune.

Launching Amazon Neptune

Start by navigating to the Neptune console then click “Launch Neptune” to start the launch wizard.

On this first screen you simply name your instance and select an instance type. Next we configure the advanced options. Many of these may look familiar to you if you’ve launched an instance-based AWS database service before, like Amazon Relational Database Service (RDS) or Amazon ElastiCache.

Amazon Neptune runs securely in your VPC and can create its own security group that you can add your EC2 instances to for easy-access.

Next, we are able to configure some additional options like the parameter group, port, and a cluster name.

On this next screen we can enable KMS based encryption-at-rest, failover priority, and a backup retention time.

Similar to RDS maintenance of the database can be handled by the service.

Once the instances are done provisioning you can find your connection endpoint on the Details page of the cluster. In my case it’s triton.cae1ofmxxhy7.us-east-1.rds.amazonaws.com.

Using Amazon Neptune

As stated above there are two different query engines that you can use with Amazon Neptune.

To connect to the gremlin endpoint you can use the endpoint with /gremlin to do something like:


curl -X POST -d '{"gremlin":"g.V()"}' http://your-neptune-endpoint:8182/gremlin

You can similarly connect to the SPARQL endpoint with /sparql


curl -G http://your-neptune-endpoint:8182/sparql --data-urlencode 'query=select ?s ?p ?o where {?s ?p ?o}'

Before we can query data we need to populate our database. Let’s imagine we’re modeling AWS re:Invent and use the bulk loading API to insert some data.
For Property Graph, Neptune supports CSVs stored in Amazon Simple Storage Service (S3) for loading node, node properties, edges, and edge properties.

A typical CSV for vertices looks like this:

~label,name,email,title,~id
Attendee,George Harrison,[email protected],Lead Guitarist,1
Attendee,John Lennon,[email protected],Guitarist,2
Attendee,Paul McCartney,[email protected],Lead Vocalist,3

The edges CSV looks something like this:

~label,~from,~to ,~id
attends,2,ARC307,attends22
attends,3,SRV422,attends27

Now to load a similarly structured CSV into Neptune we run something like this:

curl -H 'Content-Type: application/json' \
https://neptune-endpoint:8182/loader -d '
{
    "source": "s3://super-secret-reinvent-data/vertex.csv",
    "format": "csv",
    "region": "us-east-1",
    "accessKey": "AKIATHESEARENOTREAL",
    "secretKey": "ThEseARE+AlsoNotRea1K3YSl0l1234coVFefE12"  
}'

Which would return:

{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "2cafaa88-5cce-43c9-89cd-c1e68f4d0f53"
    }
}

I could take that result and query the loading status: curl https://neptune-endpoint:8182/loader/2cafaa88-5cce-43c9-89cd-c1e68f4d0f53

{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [{"LOAD_COMPLETED" : 1}],
        "overallStatus" : {
            "fullUri" : "s3://super-secret-reinvent-data/stuff.csv",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_COMPLETED",
            "totalTimeSpent" : 1,
            "totalRecords" : 987,
            "totalDuplicates" : 0,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }
}

For this particular data serialization format I’d repeat this loading process for my edges as well.

For RDF, Neptune supports four serializations: Turtle, N-Triples, N-Quads, and RDF/XML. I could load all of these through the same loading API.

Now that I have my data in my database I can run some queries. In Gremlin, we write our queries as Graph Traversals. I’m a big Paul McCartney fan so I want to find all of the sessions he’s attending:
g.V().has("name","Paul McCartney").out("attends").id()

This defines a graph traversal that finds all of the nodes that have the property “name” with the value “Paul McCartney” (there’s only one!). Next it follows all of the edges from that node that are of the type “attends” and gets the ids of the resulting nodes.


==>ENT332
==>SRV422
==>DVC201
==>GPSBUS216
==>ENT323

Paul looks like a busy guy.

Hopefully this gives you a brief overview of the capabilities of graph databases. Graph databases open up a new set of possibilities for a lot of customers and Amazon Neptune makes it easy to store and query your data at scale. I’m excited to see what amazing new products our customers build. Sign up for the preview today.

P.S. Major thanks to Brad Bebee and Divij Vaidya for helping to create this post!

Amazon Neptune – A Fully Managed Graph Database Service

Hello, Amazon Neptune

Graph Databases

Launching Amazon Neptune

Using Amazon Neptune

Amazon Neptune – A Fully Managed Graph Database Service

New – AWS Transfer for SFTP – Fully Managed SFTP Service for Amazon S3

Amazon Relational Database Service (RDS)

Fully Managed Build Service

Amazon Relational Database Service (RDS)

Amazon MQ – Managed Message Broker Service for ActiveMQ

Amazon SimpleDB – Simple Database Service

how find a record import other database.

異常處理Manual close is not allowed over a Spring managed SqlSession

Web Design a fully functioning Wordpress website

a Fully Customizable Date Range Picker

Amazon Created a Hiring Tool Using AI. It Immediately Started Discriminating Against Women.

Amazon scrapped a secret AI recruitment tool that showed bias against women

Ethereum 69: how to set up a fully synced blockchain node in 10 mins

PHP tutorial to list and insert a products in the database

Using a Grakn Knowledge Graph for Biological Sequence Alignment Analysis

Amazon Plans a Fulfillment Center in Oklahoma; Cherokee Nation a Partner

Managed Source Control Service

fully managed application streaming on AWS

Benefits of Fully Managed Redis

Amazon Neptune – A Fully Managed Graph Database Service

Hello, Amazon Neptune

Graph Databases

Launching Amazon Neptune

Using Amazon Neptune

相關推薦