Getting OSM data ready for MongoDB
In the following, we will download JSON documents representing node, ways and relations from OpenStreetMap using the overpass API. Then, we will transform this data in order to have more nested documents.
Downloading data
I will focus on the area of Clermont-Ferrand, but you can do the same with your area. I will present the different Overpass queries I used to download the three types OSM elements I need. Once you have selected the area using the map and you have typed the query, you can download the corresponding data with "Export>raw data from Overpass API".
For nodes
[out:json]; ( node [name] ({{bbox}}); node [amenity] ({{bbox}}); node [natural] ({{bbox}}); ); out;
For ways
[out:json]; way [name] (1,{{bbox}}); (._;>;); out;
For relations
[out:json]; relation [name] (1,{{bbox}}); (._;>;); out;
In my example, I just need the three following commands:
wget https://overpass-api.de/api/interpreter?data=%2F*%0AThis%20is%20an%20example%20Overpass%20query.%0ATry%20it%20out%20by%20pressing%20the%20Run%20button%20above%21%0AYou%20can%20find%20more%20examples%20with%20the%20Load%20tool.%0A*%2F%0A%5Bout%3Ajson%5D%3B%0A%28%0Anode%0A%20%20%5Bname%5D%0A%20%20%2845.72631510756138%2C3.0054473876953125%2C45.83729122987253%2C3.1865501403808594%29%3B%0Anode%0A%20%20%5Bamenity%5D%0A%20%20%2845.72631510756138%2C3.0054473876953125%2C45.83729122987253%2C3.1865501403808594%29%3B%0Anode%0A%20%20%5Bnatural%5D%0A%20%20%2845.72631510756138%2C3.0054473876953125%2C45.83729122987253%2C3.1865501403808594%29%3B%0A%20%20%29%3B%0Aout%3B -O clermont-node.json wget https://overpass-api.de/api/interpreter?data=%0A%5Bout%3Ajson%5D%3B%0A%0Away%0A%20%20%5Bname%5D%0A%20%20%2845.726434941923486%2C3.0152320861816406%2C45.8374108259422%2C3.1963348388671875%29%3B%0A%0A%28._%3B%3E%3B%29%3B%0A%0Aout%3B -O clermont-way.json wget https://overpass-api.de/api/interpreter?data=%2F*%0AThis%20is%20an%20example%20Overpass%20query.%0ATry%20it%20out%20by%20pressing%20the%20Run%20button%20above%21%0AYou%20can%20find%20more%20examples%20with%20the%20Load%20tool.%0A*%2F%0A%5Bout%3Ajson%5D%3B%0Arelation%0A%20%20%5Bname%5D%0A%20%20%2845.72631510756138%2C3.0054473876953125%2C45.83729122987253%2C3.1865501403808594%29%3B%0A%0A%2F*added%20by%20auto%20repair*%2F%0A%28._%3B%3E%3B%29%3B%0A%2F*end%20of%20auto%20repair*%2F%0Aout%3B -O clermont-relation.json
Extracting an array of OSM elements
The JSON document provided by the Overpass API is not ready to be loaded in MongoDB, we first need to project it on the elements array.
jq .elements download-file.json > file-to-load.json
In my case, I have
jq .elements clermont-node.json > clermont-nodes.json jq .elements clermont-way.json > clermont-ways.json jq .elements clermont-relation.json > clermont-relations.json
Ideas of interesting queries
Query the parks
On which place is the Tabac du Mazet ?
db.clermont.find({type: 'node','tags.name': RegExp("Tabac du Mazet")}) db.clermont.find({type: 'relation', members: {$elemMatch: {ref: 170201638}}})
Create a 2d index requires to change the structure of the documents to include GeoJSON information.
Create a small search engine using an text index (you should have results for "Usinage", "Le Rio", +10 results for "association")
What is the mean elevation of the peaks
db.clermont.aggregate([{$match: {"node.tags.highway": "traffic_signals"}}, {$project: {"tags.name": 1}}])
About ISIMA
db.clermont.find({"tags.name": /ISIMA/})
Cleaning and restructure the data noexports
// count the duplicates db.clermont.aggregate([{$group: { _id: "$id", uniqueIds: {$addToSet: "$_id"}, count: {$sum: 1}}}, {$match: {count: {$gt: 1}}}, {"$count": "count"}]) // remove them db.clermont.aggregate([{$group: { _id: "$id", uniqueIds: {$addToSet: "$_id"}, count: {$sum: 1}}}, {$match: {count: {$gt: 1}}}]).forEach(function(doc){doc.uniqueIds.shift(); db.clermont.remove({_id: {$in: doc.uniqueIds}})}) db.newclermont.aggregate([{"$set": {"_id": "$id"}},{$unset: "id"}, {$merge: {into:"newclermont2"}}]) db.newclermont2.aggregate([{"$set": {"url": {"$concat": ["https://www.openstreetmap.org/", "$type", "/", {"$toString": "$_id"}]}}}, {$merge: {into:"newclermont2"}}]) db.newclermont2.aggregate([{ $match: { type: 'way' } }, { $unwind: { path: '$nodes' } }, { $lookup: { from: 'newclermont2', localField: 'nodes', foreignField: '_id', as: 'node' } }, { $unwind: { path: '$node' } }, { $unset: 'nodes' }, { $group: { _id: '$_id', nodes: { $push: '$node' }, obj: { $first: '$$ROOT' } } }, { $set: { 'obj.nodes': '$nodes' } }, { $replaceRoot: { newRoot: '$obj' } }, { $unset: 'node' }, {$merge: {into:"newclermont3"}}]) db.newclermont2.aggregate([{$match: {type: {$in:["relation", "node"]}}}, {$merge: {into: "newclermont3"}}]) // remove the _id of the node in the ways db.clermont.updateMany({nodes: {$exists : true}}, {$unset: { "nodes.$[]._id": ""}}) // list the ref ids in the relations db.clermont.aggregate([{$match: {type: "relation"}}, {$project: {"members":1}}, {$unwind: "$members"}, {$group:{_id: null, refs: {$addToSet:"$members.ref"}}}])