GDS Sessions
A GDS Session is a temporary compute environment for running GDS workloads. It is a service offered by Neo4j and runs within Neo4j’s Aura cloud platform. A GDS Session reads data from a Neo4j DBMS through a remote projection, runs computations on the projected graph, and optionally writes the results back to the DBMS using remote write-back.
GDS Sessions are not available by default. Contact your account manager to get the features enabled. |
For ready-to-run notebooks, see our tutorials on GDS Sessions for AuraDB and self-managed databases. |
1. GDS Session management
The GdsSessions
object is the API entry point to the following operations:
-
get_or_create
: Create a new GDS Session, or connect to an existing one. -
list
: List all currently active GDS Sessions. -
delete
: Delete a GDS Session.
You need Neo4j Aura API credentials (CLIENT_ID
and CLIENT_SECRET
) to create a GdsSessions
object.
See the Aura documentation for instructions on how to create API credentials from your Neo4j Aura account.
from graphdatascience.session import GdsSessions, AuraAPICredentials
CLIENT_ID = "my-aura-api-client-id"
CLIENT_SECRET = "my-aura-api-client-secret"
# Needs to be specified if your Aura account has multiple tenants
TENANT_ID = None
# Create a new GdsSessions object
sessions = GdsSessions(api_credentials=AuraAPICredentials(CLIENT_ID, CLIENT_SECRET, TENANT_ID))
1.1. Creating a GDS Session
To create a GDS Session, use the get_or_create()
method.
It will create a new session if it does not exist, or connect to an existing one if it does.
If the session options differ from the existing one, an error is thrown.
The return value of get_or_create()
is an AuraGraphDataScience
object.
It offers a similar API to the GraphDataScience
object, but it is configured to run on a GDS Session.
As a convention, always use the variable name gds
for the return value of get_or_create()
.
1.1.1. Syntax
To create a GDS Session, you need to provide the following information:
-
Session name. The name must be unique.
-
Session memory. This configuration determines the amount of memory and CPU available to the session. It also determines the cost of running the session. Available configurations are listed in our API reference.
You can use the sessions.estimate()
method to estimate the size required.
Available algorithm categories are listed in our API reference.
-
DBMS connection. This is a
DbmsConnectionInfo
object that contains the URI of an Neo4j instance, a username, and a password. -
TTL. This optional parameter specifies the time-to-live of the session. The session will be automatically deleted if the session was unused for the provided duration. Usage is defined as the computation of an algorithm or the projection of a graph.
-
Cloud location. This is a
CloudLocation
object that specifies the cloud provider and region where the GDS Session will run. Required if the DBMS connection is for a self-managed database.
1.1.2. Examples
from datetime import timedelta
from graphdatascience.session import DbmsConnectionInfo, AlgorithmCategory
name = "my-new-session"
memory = sessions.estimate(
node_count=20,
relationship_count=50,
algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
)
db_connection_info = DbmsConnectionInfo("neo4j+s://mydbid.databases.neo4j.io", "my-user", "my-password")
gds = sessions.get_or_create(
session_name=name,
memory=memory,
db_connection=db_connection_info,
ttl=timedelta(hours=5),
)
from datetime import timedelta
from graphdatascience.session import DbmsConnectionInfo, AlgorithmCategory, CloudLocation
name = "my-new-session-sm"
memory = sessions.estimate(
node_count=20,
relationship_count=50,
algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
)
db_connection_info = DbmsConnectionInfo("neo4j://localhost", "my-user", "my-password")
cloud_location = CloudLocation(provider="gcp", region="europe-west1")
gds = sessions.get_or_create(
session_name=name,
memory=memory,
db_connection=db_connection_info,
ttl=timedelta(hours=5),
cloud_location=cloud_location,
)
1.2. Listing GDS Sessions
The list()
method returns the name and size of memory of all currently active GDS Sessions.
sessions.list()
1.3. Deleting a GDS Session
Use the delete()
method to delete a GDS Session.
This will terminate the session and stop any running costs from accumulating further.
Deleting a session will not affect the configured Neo4j data source.
However, any data not written back to the Neo4j instance will be lost.
sessions.delete(session_name="my-new-session")
2. Projecting graphs into a GDS Session
Once you have a GDS Session, you can project a graph into it. This operation is called remote projection because the data source is not a co-located database, but rather a remote one.
You can create a remote projection using the gds.graph.project()
endpoint with a graph name, a Cypher query, and additional optional parameters.
The Cypher query must contain the gds.graph.project.remote()
function to project the graph into the GDS Session.
2.1. Syntax
gds.graph.project(
graph_name: str,
query: str,
concurrency: int = 4,
undirected_relationship_types: Optional[List[str]] = None,
inverse_indexed_relationship_types: Optional[List[str]] = None,
): (Graph, Series[Any])
Name | Optional | Default | Description |
---|---|---|---|
|
no |
|
Name of the graph. |
|
no |
|
Projection query. |
|
yes |
|
Concurrency to use for building the graph within the session. |
|
yes |
|
Size of batches transmitted from the DBMS to the session. |
|
yes |
|
List of relationship type names that should be treated as undirected. |
|
yes |
|
List of relationship type names that should be indexed in reverse. |
Name | Type | Description |
---|---|---|
|
|
Graph object representing the projected graph. |
|
|
Statistical data about the projection. |
The concurrency
and batch_size
configuration parameters can be used to tune the performance of the remote projection.
The concurrency of the remote projection query is controlled by the Cypher runtime on the DBMS server.
Use CYPHER runtime=parallel as a query prefix to maximise performance.
The actual concurrency used depends on the DBMS server’s available processors and current operational load.
|
2.1.1. Remote projection query syntax
The remote projection query supports the same syntax as a Cypher projection, with two key differences:
-
The graph name is not a parameter. Instead, the graph name is provided to the
gds.graph.project()
endpoint. -
The
gds.graph.project.remote()
function must be used, instead of thegds.graph.project()
function.
For full details and examples on how to write Cypher projection queries, see the Cypher projection documentation in the GDS Manual.
2.1.2. Relationship type undirectedness and inverse indexing
The optional parameters undirectedRelationshipTypes
and inverseIndexedRelationshipTypes
are used to configure undirectedness and inverse indexing of relationships.
These have the same behavior as documented in the GDS Manual.
2.2. Example
This example shows how to project a graph into a GDS Session. The example graph is heterogeneous and models users and products. Users can know each other, and users can buy products. The database connection is to a new, empty AuraDB instance.
import os # for reading environment variables
from graphdatascience.session import SessionMemory, DbmsConnectionInfo, GdsSessions, AuraAPICredentials
sessions = GdsSessions(api_credentials=AuraAPICredentials(os.environ["CLIENT_ID"], os.environ["CLIENT_SECRET"]))
db_connection = DbmsConnectionInfo(os.environ["DB_URI"], os.environ["DB_USER"], os.environ["DB_PASSWORD"])
gds = sessions.get_or_create(
session_name="my-new-session",
memory=SessionMemory.m_8GB,
db_connection=db_connection,
)
gds.run_cypher(
"""
CREATE
(u1:User {name: 'Mats'}),
(u2:User {name: 'Florentin'}),
(p1:Product {name: 'ice cream', cost: 4.2}),
(p2:Product {name: 'computer', cost: 13.37})
CREATE
(u1)-[:KNOWS {since: 2020}]->(u2),
(u2)-[:BOUGHT {price: 7474}]->(p1),
(u1)-[:BOUGHT {price: 1337}]->(p2)
"""
)
With the gds
GDS Session active, project the graph and specify node and relationship property schemas as follows:
G, result = gds.graph.project(
graph_name="my-graph",
query="""
CALL {
MATCH (u1:User)
OPTIONAL MATCH (u1)-[r:KNOWS]->(u2:User)
RETURN u1 AS source, r AS rel, u2 AS target, {} AS sourceNodeProperties, {} AS targetNodeProperties
UNION
MATCH (p:Product)
OPTIONAL MATCH (p)<-[r:BOUGHT]-(user:User)
RETURN user AS source, r AS rel, p AS target, {} AS sourceNodeProperties, {cost: p.cost} AS targetNodeProperties
}
RETURN gds.graph.project.remote(source, target, {
sourceNodeProperties: sourceNodeProperties,
targetNodeProperties: targetNodeProperties,
sourceNodeLabels: labels(source),
targetNodeLabels: labels(target),
relationshipType: type(rel),
relationshipProperties: properties(rel)
})
""",
)
3. Running algorithms
You can run algorithms on a remotely projected graph in the same way you would on any projected graph. For instance, you can run the PageRank and FastRP algorithms on the projected graph from the previous example as follows:
gds.pageRank.mutate(G, mutateProperty="pr")
gds.fastRP.mutate(G, featureProperties=["pr"], embeddingDimension=2, nodeSelfInfluence=0.1, mutateProperty="embedding")
# Stream the results back together with the `name` property fetched from the database
gds.graph.nodeProperties.stream(G, db_node_properties=["name"], node_properties=["pr", "embedding"])
For a full list of the available algorithms, see the API reference.
3.1. Limitations
-
Model Catalog is supported with limitations:
-
Trained models can only be used for prediction using the same Session in which they were trained. After the Session is deleted, all trained models will be lost.
-
Model publishing is not supported, including
-
gds.model.publish
-
-
Model persistence is not supported, including
-
gds.model.store
-
gds.model.load
-
gds.model.delete
-
-
-
Topological Link Prediction algorithms are not supported, including
-
gds.alpha.linkprediction.adamicAdar
-
gds.alpha.linkprediction.commonNeighbors
-
gds.alpha.linkprediction.preferentialAttachment
-
gds.alpha.linkprediction.resourceAllocation
-
gds.alpha.linkprediction.sameCommunity
-
gds.alpha.linkprediction.totalNeighbors
-
4. Remote write-back
The GDS Session’s in-memory graph was projected from data in AuraDB, so write-back operations will persist the data back to the same AuraDB instance.
When calling any write operations, the GDS Python client will automatically use the remote write-back functionality.
This includes all .write
algorithm modes as well as all .write
graph operations.
By default, write back will happen concurrently, in one transaction per batch. The behaviour is controlled by three aspects:
-
the size of the dataset (e.g., node count or relationship count)
-
the configured batch size
-
the configured concurrency
4.1. Syntax
gds.graph.<operation>.write(
graph_name: str,
# additional parameters,
**config: Any,
): Series[Any]
gds.<algo>.write(
graph_name: str,
**config: Any,
): Series[Any]
All write-back endpoints support the following additional configuration:
Name | Optional | Default | Description |
---|---|---|---|
|
yes |
dynamic [1] |
Concurrency to use for writing back to the DBMS. |
|
yes |
- |
Dict containing additional configuration for the connection from the DBMS to the GDS Arrow Server. |
1. Twice the number of processors on the DBMS server |
Name | Optional | Default | Description |
---|---|---|---|
|
yes |
|
Size of batches retrieved by the DBMS from the session. |
4.2. Examples
Extending the previous example, we can write back the FastRP embeddings to the AuraDB instance as follows:
gds.graph.nodeProperties.write(G, "embedding")
If we want to tune the performance of the write-back, we can configure batchSize
and concurrency
.
In this example we show how to do this with an algorithm .write
mode:
gds.wcc.write(
G,
writeProperty="wcc",
concurrency=12,
arrowConfiguration={"batchSize": 25000}
)
5. Querying the database
You can run Cypher queries on the AuraDB instance using the run_cypher()
method.
There is no restriction on the type of query that can be run, but it is important to note that the query will be run on the AuraDB instance, and not on the GDS Session.
Therefore, you will not be able to call any GDS procedures from the run_cypher()
method.
gds.run_cypher("MATCH (n:User) RETURN n.name, n.embedding")