The IPTC News Architecture Working Group is happy to announce the release of NewsML-G2 version 2.34.

This version, approved at the IPTC Standards Committee Meeting at the New York Times offices on Wednesday 17th April 2024, contains one small change and one additional feature:

Change Request 218, increase nesting of <related> tags: this allows for <related> items to contain child <related> items, up to three levels of nesting. This can be applied to many NewsML-G2 elements:

  • pubHistory/published
  • QualRelPropType (used in itemClass, action)
  • schemeMeta
  • ConceptRelationshipsGroup (used in concept, event, Flex1PropType, Flex1RolePropType, FlexPersonPropType, FlexOrganisationPropType, FlexGeoAreaPropType, FlexPOIPropType, FlexPartyPropType, FlexLocationPropType)

Note that we chose not to allow for recursive nesting because this caused problems with some XML code generators and XML editors.

Change Request 219, add dataMining element to rightsinfo: In accordance with other IPTC standards such as the IPTC Photo Metadata Standard and Video Metadata Hub, we have now added a new element to the <rightsInfo> block to convey a content owner’s wishes in terms of data mining of the content. We recommend the use of the PLUS Vocabulary that is also recommended for the other IPTC standards: https://ns.useplus.org/LDF/ldf-XMPSpecification#DataMining

Here are some examples of its use:

Denying all Generative AI / Machine Learning training using this content:

<rightsInfo>
  <dataMining uri="http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-AIMLTRAINING"/>
</rightsInfo>

A simple text-based constraint:

<rightsInfo>
  <usageTerms>
    Data mining allowed for academic and research purposes only.
  </usageTerms>
  <dataMining uri="http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEECONSTRAINT" />
</rightsInfo>

A simple text based constraint, expressed using a QCode instead of a URI:

<rightsInfo>
  <usageTerms>
    Reprint rights excluded.
  </usageTerms>
  <dataMining qcode="plusvocab:DMI-PROHIBITED-SEECONSTRAINT" />
</rightsInfo>

A text-based constraint expressed in both English and French:

<rightsInfo>
  <usageTerms xml:lang="en">
    Reprint rights excluded.
  </usageTerms>
  <usageTerms xml:lang="fr">
    droits de réimpression exclus
  </usageTerms>
  <dataMining uri="http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEECONSTRAINT" />
</rightsInfo>

Using the “see embedded rights expression” constraint to express a complex machine-readable rights expression in RightsML:

<rightsInfo>
  <rightsExpressionXML langid="http://www.w3.org/ns/odrl/2/">
    <!-- RightsML goes here... -->
  </rightsExpressionXML>
  <dataMining uri="http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEEEMBEDDEDRIGHTSEXPR"/>>
</rightsInfo>

For more information, contact the IPTC News Architecture Working Group via the public NewsML-G2 mailing list.

(Most of) the IPTC Board of Directors gathering outside The New York Times offices for the IPTC Spring Meeting 2024.

Last week, the IPTC Spring Meeting 2024 brought media industry experts together for three days in New York City to discuss many topics including AI, archives and authenticity.

Hosted by both The New York Times and Associated Press, over 50 attendees from 14 countries participated in person, with another 30+ delegates attending online.

As usual, the IPTC Working Group leads presented a summary of their most recent work, including a new release of NewsML-G2 (version 2.34, which will be released very soon); forthcoming work on ninjs to support events, planned news coverage and live streamed video; updates to NewsCodes vocabularies; more evangelism of IPTC Sport Schema; and further work on Video Metadata Hub, the IPTC Photo Metadata Standard and our emerging framework for a simple way to express common rights statements using RightsML.

We were very happy to hear many IPTC member organisations presenting at the Spring Meeting. We heard from:

  • Anna Dickson of recently-joined member Google talked about their work with IPTC in the past and discussed areas where we could collaborate in the future
  • Aimee Rinehart of Associated Press presented AP’s recent report on the use of generative AI in local news
  • Scott Yates of JournalList gave an update on the trust.txt protocol
  • Andreas Mauczka, Chief Digital Officer at Austria Press Agency APA presented on APA’s framework for use of generative AI in their newsroom
  • Drew Wanczowski of  Progress Software gave a demonstration of how IPTC standards can be implemented in Progress’s tools such as Semaphore and MarkLogic
  • Vincent Nibart and Geert Meulenbelt of new IPTC Startup Member Kairntech presented on their recent work with AFP on news categorisation using IPTC Media Topics and other vocabularies
  • Mathieu Desoubeaux of IPTC Startup Member IMATAG presented their work, also with AFP, on watermarking images for tracking and metadata retrieval purposes

In addition we heard from guest speakers:

  • Jim Duran of the Vanderbilt TV News Archive spoke about how they are using AI to catalog and tag their extensive archive of decades of broadcast news content
  • John Levitt of Elvex spoke about their system which allows media organisations to present a common interface (web interface and developer API) to multiple generative AI models, including tracking, logging, cost monitoring, permissions and other governance features which are important to large organisations using AI models.
  • Toshit Panigrahi, co-founder of TollBit spoke about their platform for “AI content licensing at scale”, allowing content owners to establish rules and monitoring around how their content should be licensed for both the training of AI models and for retrieval-augmented generation (RAG)-style on-demand content access by AI agents.
  • We also heard an update about the TEMS – Trusted European Media Data Space project. 

We were also lucky enough to take tours of the Associated Press Corporate Archive on Tuesday and the New York Times archive on Wednesday. Valierie Komor of AP Corporate Archives and Jeff Roth of The New York Times Archival Library (known to staffers as “the morgue”) both gave fascinating insights and stories about how both archives preserve the legacy of these historically important news organisations.

Brendan Quinn, speaking for Judy Parnall of the BBC, also presented an update of the recent work of C2PA and Project Origin and introduced the new IPTC Media Provenance Committee, dedicated to bringing C2PA technology to the news and media industry.

On behalf all attendees, we would like to thank The New York Times and Associated Press for hosting us, and especially to thank Jennifer Parrucci of The New York Times and Heather Edwards of The Associated Press for their hard work in coordinating use of their venues for our meeting.

The next IPTC Member Meeting will be the 2024 Autumn Meeting, which will be held online from Monday September 30th to Wednesday October 2nd, and will include the 2024 IPTC Annual General Meeting. The Spring Meeting 2025 will be held in Western Europe at a location still to be determined.

Origin Verified Publisher checkmark logo
The group has created the “Origin Verified Publisher” graphic to convey the fact that content has been signed by a certificate granted to a publisher that has been verified according to the Project Origin process.

The International Press Telecommunications Council, in conjunction with Project Origin, has established a working group to create and manage a C2PA compatible list of verified news publishers.

The open C2PA 2.0 Content Credentials standard for media provenance is widely supported as a strong defence against misinformation. Recent announcements by OpenAI, Meta, Google and others have confirmed the value of an interoperable, tamper-evident way of confirming the source and technical integrity of digital media content.

Project Origin, as a co-founder of the C2PA, has brought the needs of the news publishing community to the forefront of the creation of this standard. This now includes the creation of a C2PA 2.0 compatible Origin Verified Publisher Certificate to be used by publishers to securely create a cryptographic seal on their content. The signing certificates will be available through the IPTC, who will work with C2PA validators to gain widespread acceptance. These signing certificates will be issued by the IPTC to broadcast, print and digital native media publishers.

Origin Verified Publisher Certificates will ensure that the identity of established news organisations are protected from imposters. The certificates confirm organisational identity and do not make any judgement on editorial position. Liaison agreements with other groups in the media ecosystem will be used to accelerate the distribution of certificates.

The initial implementation uses TruePic as a certificate authority, with the BBC and CBC/Radio-Canada as trial participants.

“As a founding partner of Project Origin, CBC/Radio-Canada is proud to be one of the first media organisations to trial Origin Verified Publisher Certificates,” said Claude Galipeau, Executive Vice-President, Corporate Development, CBC/Radio-Canada. “This initiative will provide our audiences with a new and easy way of confirming that the content they’re consuming is legitimately from Canada’s national public broadcaster. It’s an important step in our adoption of the Content Credentials standard and in our fight against misinformation and disinformation.”

Jatin Aythora, Director of BBC R&D, and vice chair for Partnership on AI, said “Media provenance increases trust and transparency in news, and so is an essential tool in the fight against disinformation. That fight has never been more important, and so we hope many more media organisations will join us in securing their own Origin Verified Publisher Certificate.”

Publishers interested in working cooperatively to advance the implementation of the C2PA standard in the news ecosystem are invited to join the Media Provenance Committee of the IPTC.

For further information please contact:

  • Judy Parnall –  judy.parnall@bbc.co.uk – representing the BBC
  • Bruce MacCormack – bruce@neuraltransform.com – representing CBC/Radio-Canada
  • Brendan Quinn – mdirector@iptc.org – representing the IPTC

 

The 2024 IPTC Photo Metadata Conference takes place as a webinar on Tuesday 7th May from 1500 – 1800 UTC. Speakers hail from Adobe (makers of Photoshop), CameraBits (makers of PhotoMechanic), Numbers Protocol, Colorhythm, vAIsual and more.

First off, IPTC Photo Metadata Working Group co-leads, David Riecks and Michael Steidl, will give an overview of what has been happening in the world of photo metadata since our last Conference in November 2022, including IPTC’s work on metadata for AI labelling, “do not train” signals, provenance, diversity and accessibility.

Next, a panel session on AI and Image Authenticity: Bringing trust back to photography? discusses approaches to the problem of verifying trust and credibility for online images. The panel features C2PA lead architect Leonard Rosenthol (Adobe), Dennis Walker (Camera Bits), Neal Krawetz (FotoForensics) and Bofu Chen (Numbers Protocol).

Next, James Lockman of Adobe presents the Custom Metadata Panel, which is a plugin for Photoshop, Premiere Pro and Bridge that allows for any XMP-based metadata schema to be used – including IPTC Photo Metadata and IPTC Video Metadata Hub. James will give a demo and talk about future ideas for the tool.

Finally, a panel on AI-Powered Asset Management: Where does metadata fit in? discusses teh relevance of metadata in digital asset management systems in an age of AI. Speakers include Nancy Wolff (Cowan, DeBaets, Abrahams & Sheppard, LLP),  Serguei Fomine (IQPlug), Jeff Nova (Colorhythm) and Mark Milstein (vAIsual).

The full agenda and links to register for the event are available at https://iptc.org/events/photo-metadata-conference-2024/

Registration is free and open to anyone who is interested.

See you there on Tuesday 7th May!

The IPTC Photo Metadata Working Group has updated the IPTC Photo Metadata User Guide, including guidance for accessibility and for tagging AI-generated images with metadata.

The updates to the User Guide are across several areas:

Please let us know if you spot any other areas of the user guide that should be updated or if you have suggestions for more guidance that we could give.

Google logoAfter many years of working together in various areas related to media metadata, IPTC, the global technical standards body of the news media, today announces that Google LLC has joined IPTC as a Voting Member.

As a Voting Member, Google will take part in all decisions regarding IPTC standards and delegates will contribute to shaping the standards as they evolve. This important work will happen alongside IPTC’s 26 other Voting Member companies. 

“Google has worked with IPTC standards for many years, so it is great to see them join IPTC so that they can take part in shaping those standards in the future,” said Robert Schmidt-Nia of DATAGROUP, Chair of the Board of IPTC. “We look forward to working together with Google on our shared goals of making information usable and accessible.”

“Google has a long history of working with the IPTC, and we are very happy to now have joined the organization,” Anna Dickson, Product Manager at Google, said. “Joining aligns with our efforts to help provide more information and context to people online. We think this is critical to increasing trust in the digital ecosystem as AI becomes more ubiquitous.”

Google’s work together with IPTC started back in 2010 when schema.org, a joint project managed by Google on behalf of search engines, adopted IPTC’s rNews schema as the basis for schema.org’s news properties such as NewsArticle and CreativeWork. In 2016, the IPTC was a recipient of a Google News Initiative grant to develop the EXTRA rules-based metadata classification engine.

Google staff spoke at the Photo Metadata Conference (co-hosted with CEPIC) in 2018, which led to Google and the IPTC working together (along with CEPIC) on adding support for copyright, credit and licensing information in Google image search results. This has continued to include support for the Digital Source Type property which will now be used to signal content created by Generative AI engines.

The latest update to IPTC NewsCodes, the 2024-Q1 release, was published on Thursday 28th March.

This release includes many updates to our Media Topic subject vocabulary, plus changes to Content Production Party Role, Horse Position, Tournament Phase, Soccer Position, Genre, User Action Type and Why Present.

UPDATE on 11 April: we released a small update to the Media Topics, including Norwegian (no-NB and no-NN) translations of the newly added terms, thanks to Norwegian news agency NTB.

We also made one label change in German: medtop:20000257 from “Alternative-Energie” to “Erneuerbare Energie,” This change was made at the request of German news agency dpa.

Changes to Media Topics vocabulary

As part of the regular review undertaken by the NewsCodes Working Group, many changes were made to the economy, business and finance branch of Media Topics. In addition, a number of changes were made to the conflict, war and peace branch in response to suggestions made by new IPTC member ABC Australia.

5 new concepts: sustainability, profit sharing, corporate bond, war victims, missing in action.

12 retired concepts: justice, restructuring and recapitalisation, bonds, budgets and budgeting, consumers, consumer issue, credit and debt, economic indicator, government aid, investments, prices, soft commodities market.

55 modified concepts: peacekeeping force, genocide, disarmament, prisoners of war, war crime, judge, economy, economic trends and indicators, business enterprise, central bank, consumer confidence, currency, deflation, economic growth, gross domestic product, industrial production, inventories, productivity, economic organisation, emerging market, employment statistics, exporting, government debt, importing, inflation, interest rates, international economic institution, international trade, trade agreements, balance of trade, trade dispute, trade policy, monetary policy, mortgages, mutual funds, recession, tariff, market and exchange, commodities market, energy market, debt market, foreign exchange market, loan market, loans and lending, study of law, disabilities, mountaineering, sport shooting, sport organisation, recreational hiking and climbing, start-up and entrepreneurial business, sharing economy, small and medium enterprise, sports officiating, bmx freestyle.

48 concepts with modified names/labels: judge, emergency incident, transport incident, air and space incident, maritime incident, railway incident, road incident, restructuring and recapitalisation, economic trends and indicators, exporting, importing, interest rates, balance of trade, mortgages, commodities market, soft commodities market, loans and lending, study of law, disabilities, mountain climbing, mountaineering, sport shooting, sport organisation, recreational hiking and climbing, start-up and entrepreneurial business, sports officiating, bmx freestyle, tsunami, healthcare industry, developmental disorder, depression, anxiety and stress, public health, pregnancy and childbirth, fraternal and community group, cyber warfare, public transport, taxi and ride-hailing, shared transport, business reporting and performance business restructuring commercial real estate residential real estate podcast, financial service, business service, news industry, diversity, equity and inclusion

57 modified definitions: war crime, economy, economic trends and indicators, business enterprise, central bank, consumer confidence, currency, deflation, economic growth, economic organisation, emerging market, employment statistics, exporting, government debt, importing, inflation, interest rates, international economic institution, trade agreements, trade dispute, trade policy, mortgages, recession, tariff, market and exchange, commodities market, energy market, soft commodities market, debt market, foreign exchange market, loan market, loans and lending, disabilities, mountaineering, sport organisation, start-up and entrepreneurial business, sharing economy, small and medium enterprise, tsunami, healthcare industry, developmental disorder, depression, anxiety and stress, public health, pregnancy and childbirth, cyber warfare, public transport, taxi and ride-hailing, shared transport, business reporting and performance, business restructuring, commercial real estate, residential real estate, podcast, financial service, business service, news industry.

22 modified broader terms (hierarchy moves): peacekeeping force, genocide, disarmament, prisoners of war, business enterprise, central bank, consumer confidence, currency, gross domestic product, industrial production, inventories, productivity, economic organisation, emerging market, interest rates, international economic institution, international trade, monetary policy, mutual funds, tariff, loans and lending, bmx freestyle.

These changes are already available in the en-GB, en-US and Swedish (se) language variants. Thanks go to TT and Bonnier News for their work on the Swedish translation.

If you would like to contribute or update a translation to your language, please contact us.

Sports-related NewsCodes updates

We also made some changes to our sports NewsCodes vocabularies, which are mostly used by SportsML and IPTC Sport Schema.

New vocabulary: Horse Position

New entries in Tournament Phase vocabulary: Heat, Round of 16

New entry in Soccer Position: manager,

News-related NewsCodes updates

Content Production Party Role: new term Generative AI Prompt Writer which can also be used in Photo Metadata Contributor to declare who wrote the prompt that was used to generate an image.

Genre: new term User-Generated Content.

Why Present: new term associated.

The User Action Type vocabulary, mostly used by NewsML-G2, has had some major changes.

Previously this vocabulary defined terms related to specific social media services or interactions. We have retired/deprecated all site-specific terms (Facebook Likes, Google’s +1Twitter re-tweets, Twitter tweets).

Instead, we have defined some generic terms: Like, Share, Comment. The pageviews term has been broadened into simply views (although the ID remains as “pageviews” for backwards-compatibility)

 

Thanks to the NewsCodes Working Group for their work on this release, and to all members and non-members who have suggested changes.

 

Screenshot of an article from SearchEngineLand entitled "Google wants you to label AI-generated images used in Merchant Center", with the subtitle: "Don't remove embedded metadata tags such as trainedAlgorithmicMedia from such images," Google wrote.
The new guidance has received some press in the Search Engine Optimisation (SEO) world, including this post on SearchEngineLand.

Google has added Digital Source Type support to Google Merchant Center, enabling images created by generative AI engines to be flagged as such in Google’s products such as Google search, maps, YouTube and Google Shopping.

In a new support post, Google reminds merchants who wish their products to be listed in Google search results and other products that they should not strip embedded metadata, particularly the Digital Source Type field which can be used to signal that content was created by generative AI.

We at the IPTC fully endorse this position. We have been saying for years that website publishers should not strip metadata from images. This should also include tools for maintaining online product inventories, such as Magento and WooCommerce. We welcome contact from developers who wish to learn more about how they can preserve metadata in their images.

Here’s the full text of Google’s recommendation:

Preserving metadata tags for AI-generated images in Merchant Center
February 2024
If you’re using AI-generated images in Merchant Center, Google requires that you preserve any metadata tags which indicate that the image was created using generative AI in the original image file.

Don't remove embedded metadata tags such as trainedAlgorithmicMedia from such images. All AI-generated images must contain the IPTC DigitalSourceType trainedAlgorithmicMedia tag. Learn more about IPTC photo metadata.

These requirements apply to the following image attributes in Merchant Center Classic and Merchant Center Next:

Image link [image_link]
Additional image link [additional_image_link]
Lifestyle image link [lifestyle_image_link]
Learn more about product data specifications.

The IPTC News Architecture Working Group is happy to announce that the NewsML-G2 Guidelines and NewsML-G2 Specification documents have been updated to align with version 2.33 of NewsML-G2, which was approved in October 2023.

The changes include:

Specification changes:

  • Adding the newest additions authoritystatus and digitalsourcetype added in NewsML-G2 versions 2.32 and 2.33
  • Clarification on how @uri, @qcode and @literal attributes should be treated throughout
  • Clarification on how roles should be added to infosource element when an entity plays more than one role
  • Clarifying and improving cross-references and links throughout the document

Guidelines changes:

We always welcome feedback on our specification and guideline documents: please use the Contact Us form to ask for clarifications or suggest changes.

A cute robot penguin painting a picture of itself using a canvas mounted on a a wooden easel, in the countryside. Generated by Imagine with Meta AI
An image generated by Imagine with Meta AI, using the prompt “A cute robot penguin painting a picture of itself using a canvas mounted on a wooden easel, in the countryside.” The image contains IPTC DigitalSourceType metadata showing that it was generated by AI.

Yesterday Nick Clegg, Meta’s President of Global Affairs, announced that Meta would be using IPTC embedded photo metadata to label AI-Generated Images on Facebook, Instagram and Threads.

Meta already uses the IPTC Photo Metadata Standard’s Digital Source Type property to label images generated by its platform. The image to the right was generated using Imagine with Meta AI, Meta’s image generation tool. Viewing the image’s metadata with the IPTC’s Photo Metadata Viewer tool shows that the Digital Source Type field is set to “trainedAlgorithmicMedia” as recommended in IPTC’s Guidance on metadata for AI-generated images.

Clegg said that “we do several things to make sure people know AI is involved, including putting visible markers that you can see on the images, and both invisible watermarks and metadata embedded within image files. Using both invisible watermarking and metadata in this way improves both the robustness of these invisible markers and helps other platforms identify them.”

This approach of both direct and indirect disclosure is in line with the Partnership on AI’s Best Practices on signalling the use of generative AI.

Also, Meta are building recognition of this metadata into their tools: “We’re building industry-leading tools that can identify invisible markers at scale – specifically, the “AI generated” information in the C2PA and IPTC technical standards – so we can label images from Google, OpenAI, Microsoft, Adobe, Midjourney, and Shutterstock as they implement their plans for adding metadata to images created by their tools.”

We have previously shared the news that Google, Microsoft, Adobe, Midjourney and Shutterstock will use IPTC metadata in their generated images, either directly in the IPTC Photo Metadata block or using the IPTC Digital Source Type vocabulary as part of a C2PA assertion. OpenAI has just announced that they have started using IPTC via C2PA metadata to signal the fact that images  from DALL-E are generated by AI.

A call for platforms to stop stripping image metadata

We at the IPTC agree that this is a great step towards end-to-end support of indirect disclosure of AI-generated content.

As the Meta and OpenAI posts points out, it is possible to strip out both IPTC and C2PA metadata either intentionally or accidentally, so this is not a solution to all problems of content credibility.

Currently, one of the main ways metadata is stripped from images is when they are uploaded to Facebook or other social media platforms. So with this step, we hope that Meta’s platforms will stop stripping metadata from images when they are shared – not just the fields about generative AI, but also the fields regarding accessibility (alt text), copyright, creator’s rights and other information embedded in images by their creators.

Video next?

Meta’s post indicates that this type of metadata isn’t commonly used for video or audio files. We agree, but to be ahead of the curve, we have added Digital Source Type support to IPTC Video Metadata Hub so videos can be labelled in the same way.

We will be very happy to work with Meta and other platforms on making sure IPTC’s standards are implemented correctly in images, videos and other areas.