Creating Drupal content using Python
I wanted to import data into Drupal from elsewhere - in my case, from XML exported by another system.
Perhaps the easiest way to interface with Drupal at a simple level is to use the Services module. This lets you, for example, create applications in other languages which can create, list, or update Drupal content. There is a mix of documentation for it, some of which is a bit dated.
So here’s a quick example of how you can use a Python program to create a node in Drupal. Hope this helps to get you started if you need to do something similar.
#! /usr/bin/python
#
# Creating nodes in Drupal 6 via the Services module using
# Python and XML-RPC.
#
# To use this, get and install the Services module from
# http://drupal.org/project/services
#
# Enable the XMLRPC Server module, the System Service module and the
# Node Service module at least.
#
#
# Note that this is intended to demo quick importing of data from other
# systems. It makes no use of authentication, in fact you need to enable
# various permissions for anonymous users to be able to use it.
# Enable things like 'administer nodes', 'create page content',
# 'create url aliases' and 'administer url aliases'.
#
# You should remember, therefore, to DISABLE them before you go live
# with a production site!
#
# This assumes you have 'Use sessid' but not 'Use keys' enabled in the
# Services settings at http://yoursite/admin/build/services/settings.
import xmlrpclib, time
# Put the URL for your Service in here. See admin/build/services.
s = xmlrpclib.ServerProxy('http://myhost.com/services/xmlrpc')
class node:
# You need to set uid and username appropriately for your site if you don't want
# everything to be posted by Anonymous.
def __init__(self, title, body, path, ntype='page', uid=1, username='qsf'):
self.title = title
self.body = body
self.path = path
self.type = ntype
self.uid = uid
self.name = username
self.promote = False
try:
sessid, user = s.system.connect()
# Here you could read in the content for each node from some other source and do
# n = node(title, body, path)
# but for now we'll just do:
n = node('A test node', 'This is an interesting page', 'interesting')
# and then save it into Drupal
s.node.save(sessid, n)
# where it should appear at /interesting.
except xmlrpclib.Fault, err:
print "A fault occurred"
print "Fault code: %d" % err.faultCode
print "Fault string: %s" % err.faultString
Importing content from Plone to Drupal
I used a variant of this script to convert a Plone-based site to Drupal. Whether any of the following would work for somebody else’s site depends to a great degree on the particular configurations involved, but I offer it in case it’s useful as inspiration! You’ll almost certainly need to tweak it.
As those familiar with Plone will know, the content is stored in an object-oriented database, and mapping that onto other things can be tricky. Most of my content was reasonably straightforward, though, and I tackled the problem by switching on the Zope options to make the Plone site accessible by FTP. I could then use the FTP tool of my choice to access what looked like a filesystem and copy the whole hierarchy onto another machine.
My Plone content was basically HTML and images, and each node in Plone
came across as a simple HTML file without all the boilerplate
formatting. Perfect! I could then write a script which would take, for
example, the file in news/article1
, read it as XML, gather the bits I
wanted and create a node in Drupal which also had news/article1
as the
path.
The main bit of manual tweaking was to do with relative URLs. In Drupal
you really want every link on your site to be an absolute URL (ie. to
begin with ‘/’), because there are so many ways of chopping and
displaying your content. On the old site, news/article1
might link to
article2
in the same folder, which isn’t going to work if
news/article1
is sometimes accessed as node/29
, or
recentstuff/thisweek
, or whatever. You need to look for links that
don’t begin either with http or with / and update them
appropriately to /news/article2
or whatever. I didn’t do that in the
script because there were few enough that I could do it using global
searches in my editor.
OK, so here’s my import.py script. Having copied the Plone content to my local drive by FTP, I changed into the folder and did, for example:
./import.py news/article1 news/article2 ...
OK - here it is. I hope it’s reasonably self-explanatory. I use the tidy utility to convert sometimes non-conformant HTML into valid XML, and
#! /usr/bin/python
#
# This assumes you have 'Use sessid' but not 'Use keys' enabled in the
# Services settings at http://yoursite/admin/build/services/settings.
# You'll need the appropriate permissions set, and the Path module enabled.
import xmlrpclib, time, sys, subprocess, os
import xml.etree.ElementTree as ET
# Where is the XML-RPC service on your Drupal site?
s = xmlrpclib.ServerProxy(':8888/services/xmlrpc')
class node:
# You need to set uid and username appropriately for your site if you don't want
# everything to be posted by Anonymous.
def __init__(self, title, body, path, ntype='page', date=None, uid=1, username='qsf'):
self.title = title
self.body = body
self.path = path
self.type = ntype
self.uid = uid
self.name = username
self.promote = False
self.format = 3
self.comment = 0
if date:
# self.created = date
# self.changed = date
self.date = date
print "date = ",date
sessid, user = s.system.connect()
for path in sys.argv[1:]:
try:
# Read the XML file, tidying it up and making valid XML.
tidypipe = subprocess.Popen(["tidy", "-q", "-asxml", "-n", path],
stdout=subprocess.PIPE)
# The 'tidy' process will sometimes add namespace declarations.
# ElementTree stuff below gets messy if we have to parse namespaces
# so I'm just going to throw them away before we read the file.
nsstrip = subprocess.Popen(["sed", "s/^<html .*>$/<html>/"],
stdin = tidypipe.stdout, stdout=subprocess.PIPE)
# Read the file into a DOM tree
tree = ET.parse(nsstrip.stdout)
root = tree.getroot()
desc = None
date = None
ntype = 'page'
# Some of the info we want is in the <meta> tags.
for i in root.findall(".//meta"):
if i.get('name')=='Description': desc = i.get('content')
if i.get('name')=='Effective_date':
date = i.get('content')
if i.get('name')=='Type' and i.get('content')=='News Item':
ntype='story'
# This is where we build up the body of our new node.
body = ""
# We turn what was the 'Description' in Plone into the teaser text in Drupal
if desc:
body += "<p>%s</b>\n<!--break-->\n" % desc
# The title comes from <title>
title = root.find('.//title').text
# and the rest from <body>
body_outer = root.find('.//body')
# The <body> may have text that isn't inside any element.
body += body_outer.text
# then we add all the elements.
for n in body_outer:
body += ET.tostring(n)
# Strip off index_html, if that's part of the path
path = path.replace('/index_html', '')
# Show the user what's happening
print "Title:",title
print "Desc: ",desc
print "Date: ",date
# And create a new node
n = node(title, body, path, ntype=ntype, date=date)
s.node.save(sessid, n)
# On a Mac this will open it in the browser so you can check it out.
# os.system('open %s/%s' % (':8888', path))
print "------------------------------------------"
except xmlrpclib.Fault, err:
print "======================================"
print "A fault occurred with",path
print "Fault code: %d" % err.faultCode
print "Fault string: %s" % err.faultString
print "======================================"
I don’t normally have comments switched on on this site, so if you’d like to add any comments/tips, it might be best to do so on the Status-Q post.
Quentin Stafford-Fraser, Dec 2008.