AWS Glue Data Catalog
The AWS Glue Data Catalog connector for rudol allows you to import the documentation already stored in your Glue catalog, and extract data lineage from your Glue Jobs.
Connection parameters
| Name | Type | Description |
|---|---|---|
| Region | text | AWS region where your Glue Data Catalog lives (e.g. us-east-1) |
| Database | text | Glue database name you want to connect to |
| Access Key ID | text | Access key ID of the IAM user created for Rudol |
| Secret Access Key | password | Secret access key of the IAM user created for Rudol |
| Role ARN | text | ARN of the IAM role Rudol will assume to access your catalog |
AWS setup
1. Create the IAM policy for Glue access
- Go to IAM → Policies → Create policy
- Select the JSON tab and paste the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:GetDatabases",
"glue:GetTables",
"glue:GetTable",
"glue:GetJob",
"glue:GetJobs",
"glue:GetJobRun",
"glue:GetJobRuns"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<glue-scripts-bucket>",
"arn:aws:s3:::<glue-scripts-bucket>/*",
"arn:aws:s3:::<datalake-bucket>",
"arn:aws:s3:::<datalake-bucket>/*"
]
}
]
}
Replace the following placeholders with your values:
| Placeholder | Description |
|---|---|
<glue-scripts-bucket> | Bucket where Glue stores the job scripts (.py files) |
<datalake-bucket> | Bucket(s) where the catalog tables reside (e.g. bronze and silver layers). Add one entry per bucket if they are separate. |
- Name the policy (e.g.
RudolGlueReadOnly) and click Create policy
info
The Glue actions enable Rudol to read table and column documentation from the Data Catalog, and to extract data lineage by reading job definitions and run history. The S3 actions are needed to read job scripts and match data locations to catalog tables.
2. Create the IAM role
- Go to IAM → Roles → Create role
- Select AWS account as the trusted entity type and enter your own Account ID
- Click Next, attach the
RudolGlueReadOnlypolicy you just created - Name the role (e.g.
RudolGlueRole) and click Create role - Copy the Role ARN (e.g.
arn:aws:iam::123456789012:role/RudolGlueRole)
3. Create the IAM user for Rudol
- Go to IAM → Users → Create user
- Enter a name (e.g.
rudol-glue-user) and click Next - Select Attach policies directly and click Create policy to create a new one with the following JSON, replacing
<your-account-id>and<role-name>with your values:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": "arn:aws:iam::<your-account-id>:role/<role-name>"
}
]
}
- Name the policy (e.g.
RudolAssumeGlueRole), create it, and attach it to the user - Click Next and then Create user
4. Generate the credentials
- Open the user you just created
- Go to Security credentials → Access keys → Create access key
- Select Third-party service as the use case
- Copy the Access Key ID and Secret Access Key and paste them into the rudol connection form along with the Role ARN from step 2
Restrict access by resource
If you want to limit Glue access to specific databases or tables instead of "Resource": "*", you can scope the policy from step 1:
"Resource": [
"arn:aws:glue:us-east-1:123456789012:catalog",
"arn:aws:glue:us-east-1:123456789012:database/my_database",
"arn:aws:glue:us-east-1:123456789012:table/my_database/*"
]