Skip to main content

Concepts

This list describes key Exfil concepts and terminology. It's a good place to get started learning how to set up and use Exfil.

Dataset

A dataset is a collection of data, typically organized in a structured format, that is used for analysis, research, or machine learning tasks. The data in a dataset can come in various forms, such as numbers, text, images, or even multimedia.

Model

A model in training is undergoing the process of learning from a dataset so it can make accurate predictions in future situations. The purpose of training a model is to enable it to make accurate predictions or decisions based on new, unseen data.

Project

A project is where the accuracy of the model is evaluated by testing its predictions against actual outcomes. This involves using a set of data that was not used during the training process to measure how well the model performs in making correct predictions or classifications.

Training

Document

A document forms part of the project's training dataset. When a document is uploaded, Exfil converts it into a collection of text blocks that preserve the text's original location on each page.

Status

The status represents the state of the document. A newly added document has a status of NEW. Any document marked as DONE will be used in the dataset when a new model is trained. The REVIEW status allow you to manage the labelling process or flag documents for follow up.

Tag

A tag allows you to categorise documents. Tags might be used to identify a document format or flag problem documents.

Field

A field relates to a certain type of data from a document type. For example, it could be an address, a total amount, or an invoice date. A field can either be document level (representative of the entire document) or a column within a table (representative of rows in a table).

Label

A label is a single occurrence of a field within a document. Multiple labels can be assigned for a single field, but each text block can only be labelled once.

Train

A model applies all the learnings from the documents in the training dataset to extract data from a previously unseen document. Exfil learns over time, with more data in the training dataset leading to more accurate results in each successive model. A model can be trained after several documents have been labelled and marked as DONE within the Training section.

To help you negotiation the complexities of the machine learning training, Unmand will manage all the model training process for you. Depending on the variability within your dataset, roughly 25-50 documents are needed to train an initial model. As a rule of thumb, the more documents, the higher the accuracy.

info

Depending on the variability within your dataset, roughly 25-50 documents are needed to train an initial model. As a rule of thumb, the more documents, the higher the accuracy.