AWS S3
Steps to integrate AWS S3 with Locale.ai
You can use an s3 bucket to send your data directly to the locale.ai platform. Here's how it's done.

Basic Requirements

You will need the following things before you can integrate your S3 Bucket.
  1. 1.
    A dedicated bucket in S3, the data that you want to upload for one or more entities should be placed in this bucket. Each of the datasets can be imported from separate buckets as well.
  2. 2.
    All the files that need to be used for one dataset should have a common prefix. The prefix can be a folder or simply a prefix in the name of the file.
  3. 3.
    Since locale.ai will automatically pick up files from your bucket once the integration is done, you'll also need to specify a refresh interval. It is basically the rate at which files will be picked from your bucket.

Integration Steps

1. Select the type of entity you want to create.

2. In the data source selection step select S3.

3. Connection and Authentication

Details like bucket name, file prefix, type (CSV, JSON, parquet), file compression, refresh interval and the date after which data needs to be picked are specified here. Check the Basic Requirements section for more details.
Once added, we'll display a bucket policy that you need to attach to your S3 bucket.
{
"Version": "2012-10-17",
"Statement": [
{{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::430692669414:user/gaia-extraction-user"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<bucket-name>",
"Condition": {
"StringEquals": {
"s3:prefix": "<folder>/"
}
}
}
]
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::430692669414:user/gaia-extraction-user"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<bucket-name>/<folder>/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::430692669414:user/gaia-extraction-user"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<bucket-name>",
"Condition": {
"StringEquals": {
"s3:prefix": "<folder>/"
}
}
}
]
}

4. Hang tight while we set up your integration and check for errors.

5.Select a file from your bucket to configure your data

6.Configure the columns from your data along with their type according to our data standards.

7.Monitor the jobs that refresh each table and gives you details of rows processed, validations, and errors.