AWS Glue Data Catalog

The AWS Glue Data Catalog connector for rudol allows you to import the documentation already stored in your Glue catalog, and extract data lineage from your Glue Jobs.

Connection parameters

Name	Type	Description
Region	`text`	AWS region where your Glue Data Catalog lives (e.g. `us-east-1`)
Database	`text`	Glue database name you want to connect to
Access Key ID	`text`	Access key ID of the IAM user created for Rudol
Secret Access Key	`password`	Secret access key of the IAM user created for Rudol
Role ARN	`text`	ARN of the IAM role Rudol will assume to access your catalog

AWS setup

1. Create the IAM policy for Glue access

Go to IAM → Policies → Create policy
Select the JSON tab and paste the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabases",
        "glue:GetTables",
        "glue:GetTable",
        "glue:GetJob",
        "glue:GetJobs",
        "glue:GetJobRun",
        "glue:GetJobRuns"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<glue-scripts-bucket>",
        "arn:aws:s3:::<glue-scripts-bucket>/*",
        "arn:aws:s3:::<datalake-bucket>",
        "arn:aws:s3:::<datalake-bucket>/*"
      ]
    }
  ]
}

Replace the following placeholders with your values:

Placeholder	Description
`<glue-scripts-bucket>`	Bucket where Glue stores the job scripts (`.py` files)
`<datalake-bucket>`	Bucket(s) where the catalog tables reside (e.g. bronze and silver layers). Add one entry per bucket if they are separate.

Name the policy (e.g. RudolGlueReadOnly) and click Create policy

info

The Glue actions enable Rudol to read table and column documentation from the Data Catalog, and to extract data lineage by reading job definitions and run history. The S3 actions are needed to read job scripts and match data locations to catalog tables.

2. Create the IAM role

Go to IAM → Roles → Create role
Select AWS account as the trusted entity type and enter your own Account ID
Click Next, attach the RudolGlueReadOnly policy you just created
Name the role (e.g. RudolGlueRole) and click Create role
Copy the Role ARN (e.g. arn:aws:iam::123456789012:role/RudolGlueRole)

3. Create the IAM user for Rudol

Go to IAM → Users → Create user
Enter a name (e.g. rudol-glue-user) and click Next
Select Attach policies directly and click Create policy to create a new one with the following JSON, replacing <your-account-id> and <role-name> with your values:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": "arn:aws:iam::<your-account-id>:role/<role-name>"
    }
  ]
}

Name the policy (e.g. RudolAssumeGlueRole), create it, and attach it to the user
Click Next and then Create user

4. Generate the credentials

Open the user you just created
Go to Security credentials → Access keys → Create access key
Select Third-party service as the use case
Copy the Access Key ID and Secret Access Key and paste them into the rudol connection form along with the Role ARN from step 2

Lake Formation

If your AWS account uses AWS Lake Formation as a governance layer, IAM permissions alone are not enough. Lake Formation adds its own access control on top of IAM — even if the IAM policy is correct, Rudol will get empty results (no databases, no tables) unless Lake Formation grants are also configured.

For a full reference on Lake Formation permissions, see the AWS Lake Formation documentation.

The grants must be applied on the IAM role that Rudol assumes (RudolGlueRole), not on the IAM user.

Grant access to databases and tables

Go to Lake Formation → Data lake permissions → Grant
Under Principal, select the IAM role used by Rudol (e.g. RudolGlueRole)
Under LF-Tags or catalog resources, select Named data catalog resources
Choose the Database you want to grant access to
Under Database permissions, check Describe
To also grant access to tables within that database, expand Table permissions, select All tables, and check Describe
Click Grant

Grant access to S3 data locations

If the datalake buckets are registered in Lake Formation as data lake locations, you also need to grant access to those locations:

Go to Lake Formation → Data lake permissions → Grant
Under Principal, select the IAM role used by Rudol (e.g. RudolGlueRole)
Under LF-Tags or catalog resources, select Data location
Choose the S3 location(s) corresponding to your datalake buckets
Check Data location and click Grant

Restrict access by resource

If you want to limit Glue access to specific databases or tables instead of "Resource": "*", you can scope the policy from step 1:

"Resource": [
  "arn:aws:glue:us-east-1:123456789012:catalog",
  "arn:aws:glue:us-east-1:123456789012:database/my_database",
  "arn:aws:glue:us-east-1:123456789012:table/my_database/*"
]

AWS Glue Data Catalog

Connection parameters​

AWS setup​

1. Create the IAM policy for Glue access​

2. Create the IAM role​

3. Create the IAM user for Rudol​

4. Generate the credentials​

Lake Formation​

Grant access to databases and tables​

Grant access to S3 data locations​

Restrict access by resource​