- group of contracts provided by Moorcrofts
- Wikipedia?
- Google AI training repositories?
- Create some lorem ipsum documents
- For font detection AI, for training we can generate images ourselves from the fonts we want to support
I think wikipedia would be a great source to use because contains a lot of data that can be searched and displaced like names, years, URL references etc…
Using lorem ipsum documents could be a starting point but won’t offer much diversity regarding formatting and real-world scenery!
I agree, but for development tests it will be very useful
Moorcrofts will be supplying a set of contracts
We now have a set of contracts for testing, with samples of pre and (desired!) post processing for anonymisation. We’ll have to keep the unanonymised versions confidential, but I think once we have a set of search terms working we could create a set of “fake name” versions which could then be used to as a demo set and for others to test. The https://contr.ai team will also be looking to produce a set of pro forma contracts with the smart tags from the test set.