1. 程式人生 > >Amazon Neptune – A Fully Managed Graph Database Service

Amazon Neptune – A Fully Managed Graph Database Service

Of all the data structures and algorithms we use to enable our modern lives, graphs are changing the world everyday. Businesses continuously create and ingest rich data with complex relationships. Yet developers are still forced to model these complex relationships in traditional databases. This leads to frustratingly complex queries with high costs and increasingly poor performance as you add relationships. We want to make it easy for you to deal with these modern and increasingly complex datasets, relationships, and patterns.

Hello, Amazon Neptune

Today we’re launching a limited preview (sign up here) of Amazon Neptune, a fast and reliable graph database service that makes it easy to gain insights from relationships among your highly connected datasets. The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds of latency. Delivered as a fully managed database, Amazon Neptune frees customers to focus on their applications rather than tedious undifferentiated operations like maintenance, patching, backups, and restores. The service supports fast-failover, point-in-time recovery, and Multi-AZ deployments for high availability. With support for up to 15 read replicas you can scale query throughput to 100s of thousands of queries per second. Amazon Neptune runs within your

Amazon Virtual Private Cloud and allows you to encrypt your data at rest, giving you complete control over your data integrity in transit and at rest.

There are a lot of interesting features in this service but graph databases may be an unfamiliar topic for many of you so lets make sure we’re using the same vocabulary.

Graph Databases

A graph database is a store of vertices (nodes) and edges (relationships or connections) which can both have properties stored as key-value pairs. Graph databases are useful for connected, contextual, relationship-driven data. Some examples applications are social media networks, recommendation engines, driving directions, logistics, diagnostics, fraud detection, and genomic sequencing.

Amazon Neptune supports two open standards for describing and querying your graphs:

  • Apache TinkerPop3 style Property Graphs queried with Gremlin. Gremlin is a graph traversal language where a query is a traversal made up of discrete steps following an edge to a node. Your existing tools and clients that are designed to work with TinkerPop allow you to quickly get started with Neptune.
  • Resource Description Framework (RDF) queried with SPARQL. SPARQL is a declarative language based on Semantic Web standards from W3C. It follows a subject->predicate->object model. Specifically Neptune supports the following standards: RDF 1.1., SPARQL Query 1.1., SPARQL Update 1.1, and the SPARQL Protocol 1.1.

If you have existing applications that work with SPARQL or TinkerPop you should be able to start using Neptune by simply updating the endpoint your applications connect to.

Let’s walk through launching Amazon Neptune.

Launching Amazon Neptune

Start by navigating to the Neptune console then click “Launch Neptune” to start the launch wizard.

On this first screen you simply name your instance and select an instance type. Next we configure the advanced options. Many of these may look familiar to you if you’ve launched an instance-based AWS database service before, like Amazon Relational Database Service (RDS) or Amazon ElastiCache.

Amazon Neptune runs securely in your VPC and can create its own security group that you can add your EC2 instances to for easy-access.

Next, we are able to configure some additional options like the parameter group, port, and a cluster name.

On this next screen we can enable KMS based encryption-at-rest, failover priority, and a backup retention time.

Similar to RDS maintenance of the database can be handled by the service.

Once the instances are done provisioning you can find your connection endpoint on the Details page of the cluster. In my case it’s triton.cae1ofmxxhy7.us-east-1.rds.amazonaws.com.

Using Amazon Neptune

As stated above there are two different query engines that you can use with Amazon Neptune.

To connect to the gremlin endpoint you can use the endpoint with /gremlin to do something like:


curl -X POST -d '{"gremlin":"g.V()"}' http://your-neptune-endpoint:8182/gremlin

You can similarly connect to the SPARQL endpoint with /sparql


curl -G http://your-neptune-endpoint:8182/sparql --data-urlencode 'query=select ?s ?p ?o where {?s ?p ?o}'

Before we can query data we need to populate our database. Let’s imagine we’re modeling AWS re:Invent and use the bulk loading API to insert some data.
For Property Graph, Neptune supports CSVs stored in Amazon Simple Storage Service (S3) for loading node, node properties, edges, and edge properties.

A typical CSV for vertices looks like this:

~label,name,email,title,~id
Attendee,George Harrison,[email protected],Lead Guitarist,1
Attendee,John Lennon,[email protected],Guitarist,2
Attendee,Paul McCartney,[email protected],Lead Vocalist,3

The edges CSV looks something like this:

~label,~from,~to ,~id
attends,2,ARC307,attends22
attends,3,SRV422,attends27

Now to load a similarly structured CSV into Neptune we run something like this:

curl -H 'Content-Type: application/json' \
https://neptune-endpoint:8182/loader -d '
{
    "source": "s3://super-secret-reinvent-data/vertex.csv",
    "format": "csv",
    "region": "us-east-1",
    "accessKey": "AKIATHESEARENOTREAL",
    "secretKey": "ThEseARE+AlsoNotRea1K3YSl0l1234coVFefE12"  
}'

Which would return:

{
    "status" : "200 OK",
    "payload" : {
        "loadId" : "2cafaa88-5cce-43c9-89cd-c1e68f4d0f53"
    }
}

I could take that result and query the loading status: curl https://neptune-endpoint:8182/loader/2cafaa88-5cce-43c9-89cd-c1e68f4d0f53

{
    "status" : "200 OK",
    "payload" : {
        "feedCount" : [{"LOAD_COMPLETED" : 1}],
        "overallStatus" : {
            "fullUri" : "s3://super-secret-reinvent-data/stuff.csv",
            "runNumber" : 1,
            "retryNumber" : 0,
            "status" : "LOAD_COMPLETED",
            "totalTimeSpent" : 1,
            "totalRecords" : 987,
            "totalDuplicates" : 0,
            "parsingErrors" : 0,
            "datatypeMismatchErrors" : 0,
            "insertErrors" : 0
        }
    }
}

For this particular data serialization format I’d repeat this loading process for my edges as well.

For RDF, Neptune supports four serializations: Turtle, N-Triples, N-Quads, and RDF/XML. I could load all of these through the same loading API.

Now that I have my data in my database I can run some queries. In Gremlin, we write our queries as Graph Traversals. I’m a big Paul McCartney fan so I want to find all of the sessions he’s attending:
g.V().has("name","Paul McCartney").out("attends").id()

This defines a graph traversal that finds all of the nodes that have the property “name” with the value “Paul McCartney” (there’s only one!). Next it follows all of the edges from that node that are of the type “attends” and gets the ids of the resulting nodes.


==>ENT332
==>SRV422
==>DVC201
==>GPSBUS216
==>ENT323

Paul looks like a busy guy.

Hopefully this gives you a brief overview of the capabilities of graph databases. Graph databases open up a new set of possibilities for a lot of customers and Amazon Neptune makes it easy to store and query your data at scale. I’m excited to see what amazing new products our customers build. Sign up for the preview today.

P.S. Major thanks to Brad Bebee and Divij Vaidya for helping to create this post!

相關推薦

Amazon NeptuneA Fully Managed Graph Database Service

Of all the data structures and algorithms we use to enable our modern lives, graphs are changing the world everyday. Businesses continuously creat

Amazon Relational Database Service (RDS)

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-

Fully Managed Build Service

AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are re

Amazon Relational Database Service (RDS) 

Amazon Web Services приглашает на работу. Amazon Web Services (AWS) – динамично растущее подразделение в составе Amazon.com. В настоящее в

Amazon MQ – Managed Message Broker Service for ActiveMQ

Messaging holds the parts of a distributed application together, while also adding resiliency and enabling the implementation of highly scalable a

Amazon SimpleDB – Simple Database Service

Many developers use Amazon SimpleDB in conjunction with Amazon Simple Storage Service (Amazon S3). Amazon SimpleDB can be used to store pointer

how find a record import other database.

developer alt left span color into mage .cn use question:how find a record import other database. answer: solution one:you user insert in

異常處理Manual close is not allowed over a Spring managed SqlSession

在SpringMVC 配合Mybatis的使用中出現這樣一個警告 [org.springframework.beans.factory.support.DisposableBeanAdapter (line-337)] - Invocation of destroy met

Web Design a fully functioning Wordpress website

Web Design作業代寫、代做CS/python程式作業、代寫C/C++語言作業、代做CSS/website作業Web Design – Final Assignment, a fully functioning Wordpress websiteYou must complete a fully fin

a Fully Customizable Date Range Picker

GLCalendarView是Glow的第二個開源專案,雖然開源的calendar有很多,但是支援range的卻很少,我們對GLCalendarView的定位是date range picker,希望它可以幫助到其他開發者~ Demo Installation CocoaP

Amazon Created a Hiring Tool Using AI. It Immediately Started Discriminating Against Women.

Thanks to Amazon, the world has a nifty new cautionary tale about the perils of teaching computers to make human decisions. According to a Reuters report p

Amazon scrapped a secret AI recruitment tool that showed bias against women

Amazon.com's machine-learning specialists uncovered a big problem: their new recruiting engine did not like women. The team had been building computer prog

Ethereum 69: how to set up a fully synced blockchain node in 10 mins

Wait for few hours until the blockchain is fully synced.The current block number as of 24th of September is: 3039786. On my AMD Ryzen 5 2600, 3.4Ghz, the s

PHP tutorial to list and insert a products in the database

PHP tutorial to list and insert a products in the databaseThis tutorial is aimed to developers that know PHP. The goal is simple, to create a proper projec

Using a Grakn Knowledge Graph for Biological Sequence Alignment Analysis

Partial ResultsThe CodeThe code you saw above is Graql. Graql is the language for Grakn — the knowledge graph. The expressivity of Graql is what makes it t

Amazon Plans a Fulfillment Center in Oklahoma; Cherokee Nation a Partner

Amazon has followed up on its announcement to build a Fulfillment Center in Oklahoma by announcing plans to build a second such fa

Managed Source Control Service

AWS CodeCommit keeps your repositories close to your build, staging, and production environments in the AWS cloud. You can transfer incremental

fully managed application streaming on AWS

DiSTI uses Amazon AppStream 2.0 to deliver virtual maintenance training solutions powered by the VE Studio platform to Global 500 companies a

Benefits of Fully Managed Redis

Amazon Web Services is Hiring. Amazon Web Services (AWS) is a dynamic, growing business unit within Amazon.com. We are currently hiring So