Concepts
This list describes key Exfil concepts and terminology. It's a good place to get started learning how to set up and use Exfil.
Dataset
A dataset is a collection of data, typically organized in a structured format, that is used for analysis, research, or machine learning tasks. The data in a dataset can come in various forms, such as numbers, text, images, or even multimedia.
Model
A model in training is undergoing the process of learning from a dataset so it can make accurate predictions in future situations. The purpose of training a model is to enable it to make accurate predictions or decisions based on new, unseen data.
Project
A project is where the accuracy of the model is evaluated by testing its predictions against actual outcomes. This involves using a set of data that was not used during the training process to measure how well the model performs in making correct predictions or classifications.
Training
Document
A document forms part of the project's training dataset. When a document is uploaded, Exfil converts it into a collection of text blocks that preserve the text's original location on each page.
Status
The status represents the state of the document. A newly added document has a status of NEW
. Any document marked as DONE
will be used in the dataset when a new model is trained. The REVIEW
status allow you to manage the labelling process or flag documents for follow up.
Tag
A tag allows you to categorise documents. Tags might be used to identify a document format or flag problem documents.
Field
A field relates to a certain type of data from a document type. For example, it could be an address, a total amount, or an invoice date. A field can either be document level (representative of the entire document) or a column within a table (representative of rows in a table).
Label
A label is a single occurrence of a field within a document. Multiple labels can be assigned for a single field, but each text block can only be labelled once.
Train
A model applies all the learnings from the documents in the training dataset to extract data from a previously unseen document. Exfil learns over time, with more data in the training dataset leading to more accurate results in each successive model. A model can be trained after several documents have been labelled and marked as DONE
within the Training
section.
To help you negotiation the complexities of the machine learning training, Unmand will manage all the model training process for you. Depending on the variability within your dataset, roughly 25-50 documents are needed to train an initial model. As a rule of thumb, the more documents, the higher the accuracy.
Depending on the variability within your dataset, roughly 25-50 documents are needed to train an initial model. As a rule of thumb, the more documents, the higher the accuracy.