EBD (XML) Quick Guide


EBD Quick Guide

This quick guide will give you a very brief overview of what the EBD files are, where they come from, the format and an idea of the data content. For a full description you should access the documentation listed on the information page: [Info]

What are EBD files? EBD stands for EPO Bibliographic Data; the files consist of EPO patent data marked up using XML tags; the data gives details of all newly published applications and patents and recent changes to previously published patents. For example, the data includes details of date of filing, titles, applicant data, classification, priority data, etc. Exactly the same data is used to print the EP Bulletin. These are complex files which require a good understanding of EPO patent procedures.

Where does the data come from?

In the EPO once per week there is a so called 'publication run' which extracts from EPASYS (the EPO's administrative database) the data required for the publication of  the EP Bulletin and EP A (patent applications) and B (granted patents) publications. Therefore, the data is a 'photograph' of all the previous week's additions, changes, etc. in EPASYS.
Note: EPASYS is our "bible" if there is a mistake in EPASYS (eg. in data entry) then there will be a mistake in the EBD file - we process the data "as is".

It must be emphasied that the EBD files are not a cumulative record of all the changes which may have been made to a file (for this users should refer to the Register of European Patents on the Internet - it is a snap shot of only those changes which occurred in a particular publication week. In addition there is a clear distinction between A and B publications (separate files) since these are separate documents for publication purposes - therefore, the B file is not an update of the A file; it is a file in its own right (extracted from EPASYS) - although, of course, much of the data may be the same in both files.

The same data is used to produce the weekly EP Bulletin (internet and CD-ROM) and for data exchange purposes. It is at this time that the EBD files are created. Obviously to download the data format of the EPASYS (DB2) database would not be much use to most people. We therefore convert the data into XML tagged data (as we do for the full text of all our patents - not only the bibliographic data). This means the data is in a system independent format - there is more detail below.

The EPO Publication server contains the full text of all EP patent documents (A and B) and this is another source not only of the bibliographic data but also the absitract, description, cliams etc of the complete patent in XML; please click here to go to the publication server.

The abstract data is captured by an EPO contractor and loaded onto our computers, in XML, once per week; based on the list for the EBD data we extract the relevant A1, A2 and A3 abstracts.

The file format

The data on this site is delivered in one format:

In WIPO ST.36 format as a 'flat' UTF-8 file.

The data on the site is contained in one zip file per publication week; for example: for publication week 1 of 2006 it is: st360601.zip; within this file there are two files (which will be quite large): one for A documents eg. s360601a.txt and one for B documents eg. s360601b.txt. As such these are not compliant (or valid) XML files. The text files simply contain numerous valid XML instances (separate bibliographic files) which must be extracted from the text files for correct processing. The .zip and .txt formats are simply used to transport the data in an easy and efficient way.

The character set used for this data is UTF-8 (Unicode)plus a few character entity references. Care should be taken when viewing this data that your system is set to read UTF-8 otherwise certain characters may not display/print correctly. You may have to write a small conversion program.

Note: The format of the XML document below is adjusted for readability!

The data

Each bibliographic file starts with:

  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE ep-bulletin PUBLIC "-//EPO//EP BULLETIN 1//EN" "ep-bulletin-v1-0.dtd">
    file="04102988.5" lang="en" country="EP" doc-number="1613106" kind="A1"
    date-publ="20060104" status="n"  dtd-version= "ep-bulletin-v1-0">
Following this is a sub-document tag starting bibliographic data, it has a language attribute:
  <SDOBI lang="fr">
Note: it is a sub-document tag because when we process the whole file there are other sub-documents such as abstract, description, claims, etc. added then the file is an ep-patent-document file (with a different DTD - Document Type Definition) and not an ep-bulletin-file.
Note: the abstract files on this site use the ep-patent-document.dtd.

The next two tags:

indicate the start of office specific data.
        <B002EP>                      --- indicates data that has been changed from a previous document
            <ep-chg idref="ep-chg0001" btag="B840" date="20041113" status="n"/>
            <ep-chg idref="ep-chg0002" btag="B844EP" date="20041113" status="n"/>
        <B007EP>DIM360 (Ver 1.5 26 Jan 2004) - 1500000/0</B007EP>
</eptags> </B000>

There then follows ST.36 tags common to all types of patents, these follow INID code order (WIPO ST.9) for published patents:
    <B100>                            --- Document identification

    <B200>                            --- Domestic filing data

    <B300>                            ---  Priority data

    <B400>                            ---  Public availability dates

    <B500>                            --- Technical data
      <B510EP>                        --- New IPCR layout (see WIPO ST.8)
        <classification-ipcr sequence="1">
          <text>G06F 17/30       20000101AFI20050105BHEP       </text>
        <B542>Geräte und Verfahren zum Extrahieren von Bildmerkmalen und Wiederauffinden von Bildern</B542>
        <B542>Picture feature extraction device, picture retrieving device, and methods thereof for picture feature extraction and retrieving picture</B542>
        <B542>Appareils et méthodes pour extraire des caractéristiques d'images et pour le recouvrement d'images</B542>

    <B700>                                --- Parties concerned with the document
      <B710>                              --- Applicants
          <snm>NEC CORPORATION</snm>
          <irf>bf-DP-591 EP</irf>
            <str>7-1, Shiba 5-chome, Minato-ku</str>
      <B720>                              --- Inventors
          <snm>Kasutani, Eiji</snm>
            <str>c/o NEC Corporation, 7-1, Shiba 5-chome</str>
            <city>Minato-ku, Tokyo</city>
      <B740>                              --- Attorneys/agents
          <snm>Glawe, Delfs, Moll & Partner</snm>
            <str>Patentanwälte Postfach 26 01 62</str>
            <city>80058 München</city>

    <B800>                                --- International convention data
      <B840 id="ep-chg0001">              --- Designated contracting states
      <B844EP id="ep-chg0002">

This is the end of the sub-document tag (which started the bibliographic data):
Each bibliographic file ends with:
</ep-bulletin>                            --- end of file

How you use this data is a matter for you: we cannot give support for database building, formatting, etc. Any competent programmer should be able to build a conversion table fairly quickly, if you already have a database of patent data. Similarly, formatting to say, HTML, output is also straightforward.

However, there is a program available to EPO National Offices which can build databases using the EBD ST.36 file as an input. This is the SOPRANO project
(Previously the Common Software/Spirit)

For more details please contact the epoline helpdesk