New Licence to Guide AI Data
Community-led dataset initiatives have been growing, including the Telugu language dataset developed by VISWAM.AI and Swecha
Hyderabad:VISWAM.AI, a joint initiative of Swecha and IIIT Hyderabad, along with SFLC.in, has released a draft licence for public consultation to address gaps in existing open-source licences in the age of artificial intelligence.
The draft was announced during a stakeholder roundtable titled “Understanding Trust and Safety in AI: From Code to Creativity”, held at IIIT Hyderabad in collaboration with FOSS United and The Linux Foundation.
The proposed licence seeks to protect community-created datasets, especially those used to train AI systems. While most open-source licences focus on software code, they do not adequately cover training data, raising concerns that large companies could use community datasets without credit or contribution.
Community-led dataset initiatives have been growing, including the Telugu language dataset developed by VISWAM.AI and Swecha. Speakers warned that without strong licensing rules, such datasets risk being taken over by proprietary players.
“The licence is built on community ownership and reciprocity, ensuring that value created using community data flows back to the community, similar to copyleft principles,” speakers said. “It also stresses verifiability, requiring clear information about data sources to identify bias and ensure safety.”
Kiran Chandra Yarlagadda of VISWAM.AI said existing licences were not designed for a world where culture and language are used to train machines. Prasanth Sugathan of SFLC.in added that the licence was needed to stop big tech companies from using community work without attribution.