ZhangZhihui's Blog  

CREATE

Syntax for nodes

You can bind each node to a variable that you can refer to later in the query. Multiple labels are separated by colons.

CREATE (charlie:Person:Actor {name: 'Charlie Sheen'}), (oliver:Person:Director {name: 'Oliver Stone'})

As of Neo4j 5.18, multiple labels can also be separated by an ampersand &, in the same manner as it is used in label expressions.

CREATE (charlie:Person&Actor {name: 'Charlie Sheen'}), (oliver:Person&Director {name: 'Oliver Stone'})

 

Reusing variables

CREATE p = (charlie:Person:Actor {name: 'Charlie Sheen'})-[:ACTED_IN {role: 'Bud Fox'}]->(wallStreet:Movie {title: 'Wall Street'})<-[:DIRECTED]-(oliver:Person:Director {name: 'Oliver Stone'}), (wallStreet)<-[:ACTED_IN {role: 'Gordon Gekko'}]-(michael:Person:Actor {name: 'Michael Douglas'})
RETURN length(p)

Creates all three nodes for Charlie Sheen, Oliver Stone and Michael Douglas and connects them all to the node representing the Wall Street movie. It then returns the length of the path from Charlie Sheen to Oliver Stone.

Note that when repeating a node’s variable, you may not add labels or properties to the repetition.

 

MATCH (charlie:Person {name: 'Charlie Sheen'})
CREATE (charlie:Actor)

This query will fail because the variable charlie has already been bound to a pre-existing node, and therefore it cannot be reused to create a new node. If you intend to add a label, use the SET clause instead.

 

Reusing variables in properties

MATCH (person:Person)
  WHERE person.name IS NOT NULL
CREATE (anotherPerson:Person {name: person.name, age: $age})

Such an expression may not contain a reference to a variable that is defined in the same CREATE statement. This is to ensure that the value of a property is always clear.

CREATE (charlie {score: oliver.score + 1}), (oliver {score: charlie.score + 1})

This query tries to create nodes such that Charlie’s score is higher than Oliver’s and vice versa, which is a contradiction. The query therefore fails.

 

Use parameters with CREATE

:param {
  "props": {
    "name": "Andy",
    "position": "Developer"
  }
}

 

CREATE (n:Person $props)
RETURN n

 

Create multiple nodes with a parameter for their properties

By providing Cypher an array of maps, it will create a node for each map.

复制代码
:param {
  "props": [ {
    "name": "Andy",
    "position": "Developer"
  }, {
    "name": "Michael",
    "position": "Developer"
  } ]
}
复制代码

 

UNWIND $props AS map
CREATE (n)
SET n = map

 

 

 

CREATE using dynamic node labels and relationship types

Node labels and relationship types can be referenced dynamically in expressions, parameters, and variables when creating nodes and relationships. This allows for more flexible queries and mitigates the risk of Cypher injection.

Syntax for creating nodes and relationships dynamically:

CREATE (n:$(<expr>))
CREATE ()-[r:$(<expr>)]->()

The expression must evaluate to a STRING NOT NULL | LIST<STRING NOT NULL> NOT NULL value. Using a LIST<STRING> with more than one item when creating a relationship using dynamic relationship types will fail. This is because a relationship can only have exactly one type.

:params {
  "nodeLabels": ["Person", "Director"],
  "relType": "DIRECTED",
  "movies": ["Ladybird", "Little Women", "Barbie"]
}

 

CREATE (greta:$($nodeLabels) {name: 'Greta Gerwig'})
WITH greta
UNWIND $movies AS movieTitle
CREATE (greta)-[rel:$($relType)]->(m:Movie {title: movieTitle})
RETURN greta.name AS name, labels(greta) AS labels, type(rel) AS relType, collect(m.title) AS movies

 

╒══════════════╤══════════════════════╤══════════╤══════════════════════════════════════╕
│name          │labels                │relType   │movies                                │
╞══════════════╪══════════════════════╪══════════╪══════════════════════════════════════╡
│"Greta Gerwig"│["Person", "Director"]│"DIRECTED"│["Ladybird", "Little Women", "Barbie"]│
└──────────────┴──────────────────────┴──────────┴──────────────────────────────────────┘

 

INSERTas a synonym ofCREATE

INSERT can be used as a synonym to CREATE for creating nodes and relationships, and was introduced as part of Cypher’s GQL conformance. However, INSERT requires that multiple labels are separated by an ampersand & and not by colon :.

INSERT (tom:Person&Actor&Director {name: 'Tom Hanks'})

 

DELETE

Example graph

The following graph is used for the examples below. It shows four actors, three of whom ACTED_IN the Movie The Matrix (Keanu ReevesCarrie-Anne Moss, and Laurence Fishburne), and one actor who did not act in it (Tom Hanks).

graph delete clause

 

Delete single node

MATCH (n:Person {name: 'Tom Hanks'})
DELETE n

This query is only possible to run on nodes without any relationships connected to them.

 

NODETACH keyword

It is also possible to delete the single node using the NODETACH DELETE clause. Using the NODETACH keyword explicitly defines that relationships will not be detached and deleted from a node. The NODETACH keyword is a mirror of the already existing keyword DETACH, and it was introduced as part of Cypher®'s GQL conformance. Including it is functionally the same as using simple DELETE.

MATCH (n:Person {name: 'Tom Hanks'})
NODETACH DELETE n

 

Delete relationships only

It is possible to delete a relationship while leaving the node(s) connected to that relationship otherwise unaffected.

MATCH (n:Person {name: 'Laurence Fishburne'})-[r:ACTED_IN]->()
DELETE r

 

Delete a node with all its relationships

MATCH (n:Person {name: 'Carrie-Anne Moss'})
DETACH DELETE n

 

Delete all nodes and relationships

MATCH (n)
DETACH DELETE n

Delete all nodes and relationships using CALL subqueries

MATCH (n)
CALL (n) {
 DETACH DELETE n
} IN TRANSACTIONS

 

Delete a database and recreate it

CREATE OR REPLACE DATABASE neo4j

 

FINISH

A query ending in FINISH — instead of RETURN — has no result but executes all its side effects. FINISH was introduced as part of Cypher®'s GQL conformance.

The following read query successfully executes but has no results:

MATCH (p:Person)
FINISH

The following query has no result but creates one node with the label Person:

CREATE (p:Person)
FINISH

It is equivalent to the following query:

CREATE (p:Person)

 

FOREACH

Lists and paths are key concepts in Cypher®. The FOREACH clause can be used to update data, such as executing update commands on elements in a path, or on a list created by aggregation.

The variable context within the FOREACH parenthesis is separate from the one outside it. This means that if you CREATE a node variable within a FOREACH, you will not be able to use it outside of the foreach statement, unless you match to find it.

Within the FOREACH parentheses, you can do any of the updating commands — SETREMOVECREATEMERGEDELETE, and FOREACH.

Mark all nodes along a path

This query sets the property marked to true on all nodes along a path.

MATCH p=(start)-[*]->(finish)
WHERE start.name = 'A' AND finish.name = 'D'
FOREACH (n IN nodes(p) | SET n.marked = true)

Mark all relationships along a path

This query sets the property marked to true on all relationships along a path.

MATCH p=(start)-[*]->(finish)
WHERE start.name = 'A' AND finish.name = 'D'
FOREACH ( r IN relationships(p) | SET r.marked = true )

Create new nodes from a list of name labels

This query creates a new node for each label in a list.

WITH ['E', 'F', 'G'] AS names
FOREACH ( value IN names | CREATE (:Person {name: value}) )

 

LIMIT

LIMIT constrains the number of returned rows.

LIMIT accepts any expression that evaluates to a positive INTEGER and does not refer to nodes or relationships.

MATCH (n)
RETURN n.name
ORDER BY n.name
LIMIT 1 + toInteger(3 * rand())

 

LIMIT and side effects

The use of LIMIT in a query will not stop side effects, like CREATEDELETE, or SET, from happening if the limit is in the same query part as the side effect.

 

UsingLIMITas a standalone clause

LIMIT can be used as a standalone clause, or in conjunction with ORDER BY or SKIP/OFFSET.

Standalone use of LIMIT:

MATCH (n)
LIMIT 2
RETURN collect(n.name) AS names

LIMIT used in conjunction with ORDER BY and SKIP:

The following query orders all nodes by name descending, skips the two first rows and limits the results to two rows. It then collects the results in a list.

MATCH (n)
ORDER BY n.name DESC
SKIP 2
LIMIT 2
RETURN collect(n.name) AS names

 

LOAD CSV

LOAD CSV is used to import data from CSV files into a Neo4j database.

LOAD CSV FROM 'https://data.neo4j.com/bands/artists.csv'
AS row
MERGE (:Artist {name: row[1], year: toInteger(row[2])})
FROM takes a STRING containing the path where the CSV file is located.
The clause parses one row at a time, temporarily storing the current row in the variable specified with AS.
The MERGE clause accesses the row variable to insert data into the database.

LOAD CSV supports both local and remote URLs. Local paths are resolved relative to the Neo4j installation folder.

 

Import CSV data into Neo4j

Import local files

You can store CSV files on the database server and then access them by using a file:/// URL. By default, paths are resolved relative to the Neo4j import directory.

LOAD CSV FROM 'file:///artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year

 

 

Import from a remote location

LOAD CSV FROM 'https://data.neo4j.com/bands/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year

 

LOAD CSV FROM 'ftp://<username>:<password>@<domain>/bands/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year

 

Import from cloud URIs

You can import data from a number of different cloud storages:

See Operations Manual → Load a dump from a cloud storage on how to set up access to cloud storages.

LOAD CSV FROM 'azb://azb-account/azb-container/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year

 

LOAD CSV FROM 'gs://gs-bucket/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year

 

LOAD CSV FROM 's3://aws-bucket/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year

 

Import CSV files using dynamic columns

CSV columns can be referenced dynamically to map labels to nodes in the graph. This enables flexible data handling, allowing labels to be be populated from CSV column values without manually specifying each entry. It also mitigates the risk of Cypher® injection.

bands-with-headers.csv:

Id,Label,Name
1,Band,The Beatles
2,Band,The Rolling Stones
3,Band,Pink Floyd
4,Band,Led Zeppelin

 

LOAD CSV WITH HEADERS FROM 'file:///bands-with-headers.csv' AS line
MERGE (n:$(line.Label) {name: line.Name})
RETURN n AS bandNodes

 

 

 

Import compressed CSV files

LOAD CSV can read local CSV files compressed with ZIP or gzip. ZIP archives can have arbitrary directory structures but may only contain a single CSV file.

LOAD CSV FROM 'file:///artists.zip' AS row
MERGE (:Artist {name: row[1], year: toInteger(row[2])})

You can’t load zipped CSV files from remote URLs.

 

Import data from relational databases

If the source data comes from a relational model, it’s worth evaluating how to gain the most from moving to a graph data model. Before running the import, think about how the data can be modeled as a graph, and adapt its structure accordingly when running the import (see Graph data modeling).

Data from relational databases may consist of one or multiple CSV files, depending on the source database structure. A performant approach is to run multiple passes of LOAD CSV to import nodes separately from relationships.

books.csv:
id,title,author,publication_year,genre,rating,still_in_print,last_purchased
19515,The Heights,Anne Conrad,2012,Comedy,5,true,2023/4/12 8:17:00
39913,Starship Ghost,Michael Tyler,1985,Science Fiction|Horror,4.2,false,2022/01/16 17:15:56
60980,The Death Proxy,Tim Brown,2002,Horror,2.1,true,2023/11/26 8:34:26
18793,Chocolate Timeline,Mary R. Robb,1924,Romance,3.5,false,2022/9/17 14:23:45
67162,Stories of Three,Eleanor Link,2022,Romance|Comedy,2,true,2023/03/12 16:01:23
25987,Route Down Below,Tim Brown,2006,Horror,4.1,true,2023/09/24 15:34:18

 

复制代码
// Create `Book` nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/books.csv' AS row
MERGE (b:Book {id: row.id, title: row.title})
MERGE (a:Author {name: row.author});

// Create `WROTE` relationships
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/books.csv' AS row
MATCH (a:Author{name: row.author})
MATCH (b:Book{id: row.id})
MERGE (a)-[:WROTE]->(b);
复制代码

 

acted_in.csv:

movieId,person_tmdbId,role
1,12899,Slinky Dog (voice)
1,12898,Buzz Lightyear (voice)
...

 

复制代码
// Create person nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {name: row.name, tmdbId: row.person_tmdbId});

// Create movie nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {movieId: row.movieId, title: row.title});

// Create relationships
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: row.person_tmdbId})
MATCH (m:Movie {movieId: row.movieId})
MERGE (p)-[r:ACTED_IN {role: row.role}]->(m);
复制代码

 

Create additional node labels

MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p
SET p:Actor

 

Cast CSV columns to Neo4j data types

LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET p.name = row.name, p.born = date(row.born)
RETURN
  p.name AS name,
  p.tmdbId AS tmdbId,
  p.born AS born
LIMIT 5

 

Handle null values

Skip null values

LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id})

Provide a default for null values

LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id, hqLocation: coalesce(row.Location, "Unknown")})

Change empty STRING values to null values (not stored)

LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id})
SET c.email = nullIf(trim(row.Email), "")

nullif(): if the two values are equal, return null, otherwise return the first value.

 

Split list values

movies.csv:

movieId,title,budget,countries,movie_imdbId,imdbRating,imdbVotes,languages,plot,movie_poster,released,revenue,runtime,movie_tmdbId,movie_url,year,genres
1,Toy Story,30000000.0,USA,114709,8.3,591836,English,A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.,https://image.tmdb.org/t/p/w440_and_h660_face/uXDfjJbdP4ijW5hWSBrPrlKpxab.jpg,1995-11-22,373554033.0,81,862,https://themoviedb.org/movie/862,1995,Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji,65000000.0,USA,113497,6.9,198355,English|French,"When two kids find and play a magical board game, they release a man trapped for decades in it and a host of dangers that can only be stopped by finishing the game.",https://image.tmdb.org/t/p/w440_and_h660_face/vgpXmVaVyUL7GGiDeiK1mKEKzcX.jpg,1995-12-15,262797249.0,104,8844,https://themoviedb.org/movie/8844,1995,Adventure|Children|Fantasy
...

 

复制代码
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {id: toInteger(row.movieId)})
SET
    m.title = row.title,
    m.imdbId = toInteger(row.movie_imdbId),
    m.languages = split(row.languages, '|'),
    m.genres = split(row.genres, '|')
RETURN
  m.title AS title,
  m.imdbId AS imdbId,
  m.languages AS languages,
  m.genres AS genres
LIMIT 5
复制代码

 

 

Create property uniqueness constraints

CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (p:Person) REQUIRE p.tmdbId IS UNIQUE

 

Handle large amounts of data

LOAD CSV may run into memory issues with files containing a significant number of rows (approaching hundreds of thousands or millions). For large files, it’s recommended to split the import process in several lighter transactions through the clause CALL {…​} IN TRANSACTIONS.

LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
CALL (row) {
  MERGE (p:Person {tmdbId: row.person_tmdbId})
  SET p.name = row.name, p.born = row.born
} IN TRANSACTIONS OF 200 ROWS

 

 

LOAD CSV and Neo4j functions

Access line numbers with linenumber()

The linenumber() function provides the line number which LOAD CSV is operating on, or null if called outside of a LOAD CSV context.

A common use case for this function is to generate sequential unique IDs for CSV data that doesn’t have a unique column already.

LOAD CSV FROM 'file:///artists.csv' AS row
RETURN linenumber() AS number, row

Access the CSV file path with file()

The file() function provides the absolute path of the file that LOAD CSV is operating on, or null if called out of a LOAD CSV context.

LOAD CSV FROM 'file:///artists.csv' AS row
RETURN DISTINCT file() AS path

 

 

Field delimiter

The default field delimiter is ,. Use the FIELDTERMINATOR option to specify a different field delimiter.

If you try to import a file that doesn’t use , as field delimiter and you also don’t specify a custom delimiter, LOAD CSV will interpret the CSV as having a single column.

LOAD CSV FROM 'file:///artists-fieldterminator.csv' AS row FIELDTERMINATOR ';'
MERGE (:Artist {name: row[1], year: toInteger(row[2])})

 

Quotes escaping

Quoted STRING values are allowed in the CSV file and the quotes are dropped when LOAD CSV reads the data. If quoted STRING values must contain quote characters ", there are two ways to escape them:

  1. Double quotes — Use another quote " to escape a quote (for example, the CSV encoding of the STRING The "Symbol" is "The ""Symbol""").

  2. Prefix with backslash \— If the configuration setting dbms.import.csv.legacy_quote_escaping is set to true (the default value), \ works as the escape character for quotes (for example, the CSV encoding of the STRING The "Symbol" is "The \"Symbol\"").

artists-with-escaped-quotes.csv:

"1","The ""Symbol""","1992"
"2","The \"Symbol\"","1992"

 

Check source data quality

In case of a failed import, there are some elements to check to ensure the source file is not corrupted.

  • Inconsistent headers — The CSV header may be inconsistent with the data. It can be missing, have too many columns or have a different delimiter. Verify that the header matches the data in the file. Adjust the formatting, delimiters or columns.

  • Extra or missing quotes — Standalone double or single quotes in the middle of non-quoted text or non-escaped quotes in quoted text can cause issues reading the file. Either escape or remove stray quotes. See Quotes escaping.

  • Special or newline characters — When dealing with special characters in a file, ensure they are quoted or remove them.

  • Inconsistent line breaks — Ensure line breaks are consistent throughout your file.

  • Binary zeros, BOM byte order mark and other non-text characters — Unusual characters or tool-specific formatting are sometimes hidden in application tools, but become apparent in plain-text editors. If you come across these types of characters in your file, either remove them or use Cypher’s normalize function.

Inspect source files ahead of import

Before importing data into the database, you can use LOAD CSV to inspect a source file and get an idea of what form the imported data is going to have.

// Assert correct line count
LOAD CSV FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS line
RETURN count(*);

 

// Check first 5 line-sample with header-mapping
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS line
RETURN line.person_tmdbId, line.name
LIMIT 5;

 

Other ways of importing data

There are a few other tools to get CSV data into Neo4j.

  1. The neo4j-admin database import command is the most efficient way of importing large CSV files.

  2. Use a language library to parse CSV data and run creation Cypher queries against a Neo4j database. Created as an extension library to provide common procedures and functions to developers. This library is especially helpful for complex transformations and data manipulations. Useful procedures include apoc.load.jdbc, apoc.load.json, and others.

  3. The ETL Tool: allows to extract the schema from a relational database and turn it into a graph model. It then takes care of importing the data into Neo4j.

  4. The Kettle import tool maps and executes steps for the data process flow and works well for very large datasets, especially if you are already familiar with using this tool.

 

posted on   ZhangZhihuiAAA  阅读(26)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
 
点击右上角即可分享
微信分享提示