Neo4j is one of the distinguished open source NOSQL graph database. There are plenty of open source graph databases available, however Neo4j is one of the most popular options available currently. Online backup and high availability are some of the preeminent features of Neo4j.
The graph traversal in Neo4j is supported by number of different ways. This article is focused on the the Neo4j proprietary graph traversal language, its called Cypher.
Cypher is really powerful and very intuitive language to traverse a complex graph. Some of the basic operations can be very efficiently done in Cypher. We have recently evaluated Neo4j database for a large project and heavily used Cypher queries.
Cypher can be really handy for a Neo4j developer to traverse and debug from admin web console. This article is trying to document some of the most common queries we found useful.
Below are some commonly used query snippets for you to use as a cheat sheet.
Count All Relationships In The Database
START r=rel(*) return count(r);
Count all Nodes in the database
START n=node(*) return count(n);
Find A Node With A Field Name And Value
Lets say you want to find a node with attribute “name” = “Joe”. The simple but not very good performance query can be done using a where clause in the cypher query
START n=node(*)
where n.name = 'Joe'
return n;
Much faster way to do same query is using Indexed fields. If you want to search for a field using this approach it must be indexed. In our example it requires you to have a index on “name” attribute. Lets assume we have a index named PersonIndex on this field.
You can use following query to search all nodes that have name as Joe
START n=node:PersonIndex(name='Joe')
return n;
Get A Vertex By Its Id
This is probably the simplest query. Lets say we want to get a node with id 1, then use below query
START n=node(1)
return n;
Get The Attribute Of A Vertex By Its Id
Lets say the attribute name is “firstName” then we can use below query to find first name of a vertex id 1
START n=node(10)
return n.firstName;
This can be done with any custom attribute you may have added in your node. Lets say “lastName” can be obtained by this
START n=node(10)
return n.lastName;
Get All The Vertices For More Than One Id
To get the vertex object itself you can use below query
START n= node(1,2,3,4,5,6,7)
return n;
To get a specific attribute of each vertex, lets say firstName. You can use below query
START n= node(1,2,3,4,5,6,7)
return n.firstName;
Get A Vertex By An Attribute
Lets say we want to find a vertex (or all vertices) that have “firstName” as “John”, below query can be used.
First way to do it is
START n=node(*)
WHERE n.firstName = 'John'
return n;
However, this may not be very good in performance. Therefore I would recommend using a index based query for this. In this case you must have a index on the field you are trying to search. Lets assume we have a index named firstNameIdx on this field then the query will look like this.
START n=node:firstNameIdx(firstName='John')
return n;
Get The ID Of A Vertex By An Attribute
The id attribute of a node is always available however it can be returned using cypher query by a id() function.
For example, lets assume a index named firstNameIdx on a field named firstName, you can retrieve the id of corresponding node using below query
START n=node:firstNameIdx(firstName='John')
return id(n);
Get The Count Of Vertices With A Attribute Value
Lets say you just want to know how may people in you database with first name “John”. This simple query can be done like below.
START n=node:firstNameIdx(firstName='John')
return count(n);
Get The Edge Of A Vertex With Label “friend”
Now starts the real fun with graph language. You may already know that you can have outgoing or incoming relations in a graph.
Lets say you want to know all the out going edges with label “friend” .
START n=node(1)
MATCH n-[r:friend]->()
return r;
Lets say you want to know all the incoming edges with label “friend” .
START n=node(1)
MATCH n<-[r:friend]-() return r;
Lets say you want to know all the in coming and out going edges with label “friend” .
START n=node(1)
MATCH n-[r:friend]-()
return r;
Get The Count Of Edges Of A Vertex With Label “friend”
Count is a useful function that can be applied on vertices and edges to count them instead of getting the object itself.
Lets say you want to know, how many friends are connected to a node id 1
START n=node(1)
MATCH n-[r:friend]-()
return count(r);
Get All Out Going Edges Of A Vertex
Just like vertex, you can also fetch the attributes of any edge using the name of the attribute.
START n=node(1)
MATCH n-[r]->()
return r;
Get The Count Of All Out Going Edges Of A Vertex
Lets say you want to know, how many edges are connected to a node id 1 (irrespective of relation label)
START n=node(1)
MATCH n-[r]->()
return count(r);
Get All Friends By A Relation
Get the first names of all people that are connected to a vertex id 1 by a friend relation. This query should return the first names of all people that are friend to a vertex id 1
START n=node(1)
MATCH n-[:friend]->(p)
return p.firstName;
Get Age Of All Friends
Get the age of all people that are connected to a person (with last name ‘Doe’) by a friend relation
This is another combination to demonstrate the starting point can be a property and any attribute of a connected vertex can be obtained. For example we are extracting age of all friends of all persons that have last name Doe.
Lets assume we have a index on lastName field called lastNameIdx
START n=node:lastNameIdx(lastName:'Doe')
MATCH n-[:friend]->(p)
return p.age;
Get All Friends With Age 25
Get all people (with age 25) that are connected to a person by a friend relation
START n=node(1)
MATCH n-[:friend]->(p)
where p.age = 25
return p;
Get All Friends That Have Email Address
START n=node(1)
MATCH n-[:friend]->(p)
where p.email IS NOT NULL
return p;
Find All People That Have Age Greater Than 25
START n=node:city(city:'New York')
where n.age > 25
return n;
Though this seems like a simple query, it may not be very efficient since its starting with all the nodes on your graph databases. Its best to shrink your starting point to smallest for best query performance. For example if we have a index by some attribute. Lets say city. We can improve this query drastically by doing a simple index based start
start n=node:city(city:'New York')
where n.age > 25
return n;
Find Person With Age Between
Find all people that have age greater than 25 and less than 35
This can be achieved by a where clause for the age attribute.
Note: This will work with numbers. Make sure the attribute is stored as a number, not string.
start n=node(*)
where n.age > 25 and n.age < 35 return n;
Again, this query can be made more efficient using a indexed field start. For example if we have a index lets say by city. We can improve this query drastically by doing a simple index based start
start n=node:city(city:'New York')
where n.age > 25 and n.age < 35 return n;
Get Unique Results On A Complex Query
This can be done by the dedup function.
start n=node(100)
MATCH n-[rel]-d
return distinct d;
Get The Count Of Unique Results On A Complex Query
This can be achieved easily by using count function.
start n=node(100)
MATCH n-[rel]-d
return count(distinct d);
Graph Manipulation Queries In Cypher
These queries can be used to do manipulation of graph from the cypher console or client api.
Add A New Vertex (node) In The Graph
This can be done using create statement in cypher query.
CREATE (myNode {name:'my test node'})
return myNode;
If you do not wish to return any value from this statement you can easily do it by skipping the variable assignment and return statement, like this
CREATE ({name:'my test node'});
Add Two New Vertex And A Relation (with Label ‘friend’) Between Them
This can be easily done using a single statement. Note how the variables (jdoe and mj) are defined just by assigning them a value from cypher query.
CREATE (jdoe {name:'John Doe'})-[r:friend]->(mj {name:'Mary Joe'})
return r, jdoe, mj;
Add A Relation Between Two Existing Vertices (nodes) With Id 1 And 2
Lets say we want to create a friend relation between the two nodes
START first = node(1), second = node(2)
CREATE first-[r:friend]->second
return r;
Add A Relationship Between Two Nodes If It Does Not Exists
START first=node(1), second=node(2)
CREATE UNIQUE first-[r:friend]-second
return r;
Remove All Vertices Or Nodes From The Graph Database
To Delete all nodes in the database below cypher query
START n=node(*) delete n;
Remove All Edges Or Relationship From The Graph
To Delete all Relationships in the database below cypher query
START r=rel(*) delete r;
Remove All Vertices With FirstName = ‘John’
Lets assume we have a index on first name field called firstNameIndex, below query can be used to delete all nodes that have firstName as ‘John’
start n=node:firstNameIndex(firstName:'John')
delete n;
Remove A Vertex Or Node With Id 1
START n=node(1) delete n;
Remove An Edge With Id 1
START r=rel(1) delete r;
Summary
Neo4j provides multitude of means to traverse a graph. Rest API, Gremlin, Java API and Cypher are the ones we looked into.
Gremlin is a general purpose graph traversal language which is supported by wide variety of graph database vendors including Neo4j.
Cypher has great features that are advantageous to Neo4j users. , however unlike Gremlin, Cypher can be used on only Neo4j database.
I hope you will find these queries handy for your Neo4j development. Let us know of any queries you think we should add to this list?
Hey Sachin, it would be awesome if you created a version of this blog post for Neo4j 2.0.
Thanks so much
Michael