Sources for test documents

  • group of contracts provided by Moorcrofts
  • Wikipedia?
  • Google AI training repositories?
  • Create some lorem ipsum documents
  • For font detection AI, for training we can generate images ourselves from the fonts we want to support

I think wikipedia would be a great source to use because contains a lot of data that can be searched and displaced like names, years, URL references etc…

1 Like

Using lorem ipsum documents could be a starting point but won’t offer much diversity regarding formatting and real-world scenery!

I agree, but for development tests it will be very useful

Moorcrofts will be supplying a set of contracts

2 Likes

We now have a set of contracts for testing, with samples of pre and (desired!) post processing for anonymisation. We’ll have to keep the unanonymised versions confidential, but I think once we have a set of search terms working we could create a set of “fake name” versions which could then be used to as a demo set and for others to test. The https://contr.ai team will also be looking to produce a set of pro forma contracts with the smart tags from the test set.

1 Like