# Repository Uploader A command-line tool to upload all files from GitLab or GitHub repositories to a Product Copilot Knowledge Bucket. The tool traverses repository files and uploads them to the specified knowledge bucket, first clearing the bucket and then uploading the files. Supports both single repositories and multiple repositories in a single command. ## Features - Supports both GitLab and GitHub repositories - **Supports multiple repositories** - process multiple repositories in a single command using comma-separated paths - Connects to GitLab/GitHub API to traverse repository files - Filters out binary files and common non-useful directories - Uploads files with metadata including source URL and last update time - Supports both gitlab.com/github.com and self-hosted GitLab instances - Environment variable support for tokens - **Error handling** - continues processing other repositories if one fails ## Installation ### Prerequisites - GitLab API token with read access to the repository (for GitLab) - GitHub API token with read access to the repository (for GitHub) - Product Copilot API token (for the Knowledge Bucket to upload files to) ### Download Pre-built Binaries Pre-built binaries for various platforms can be found at [repo-uploader.product-copilot.ai](https://repo-uploader.product-copilot.ai). On MacOS, the downloaded binary may need to be marked as executable and removed from quarantine: ```bash xattr -d com.apple.quarantine ./repo-uploader chmod +x ./repo-uploader ``` ### Build from source To Build this tool from source you'll need Go 1.25.1 or later. ```bash git clone cd repo-uploader go build -o repo-uploader ``` ## Usage ### GitLab Repository ```bash ./repo-uploader \ --platform gitlab \ --gitlab-token YOUR_GITLAB_TOKEN \ --product-copilot-token YOUR_PRODUCT_COPILOT_TOKEN \ --repository-path "group/project-name" ``` ### GitHub Repository ```bash ./repo-uploader \ --platform github \ --github-token YOUR_GITHUB_TOKEN \ --product-copilot-token YOUR_PRODUCT_COPILOT_TOKEN \ --repository-path "owner/repo-name" ``` ### Multiple Repositories You can process multiple repositories in a single command by providing a comma-separated list of repository paths: #### Multiple GitLab Repositories ```bash ./repo-uploader \ --platform gitlab \ --gitlab-token YOUR_GITLAB_TOKEN \ --product-copilot-token YOUR_PRODUCT_COPILOT_TOKEN \ --repository-path "group/project1,group/project2,another-group/project3" ``` #### Multiple GitHub Repositories ```bash ./repo-uploader \ --platform github \ --github-token YOUR_GITHUB_TOKEN \ --product-copilot-token YOUR_PRODUCT_COPILOT_TOKEN \ --repository-path "owner/repo1,owner/repo2,another-owner/repo3" ``` #### Mixed Processing with Error Handling If any repository path is invalid or fails to process, the tool will: - Log the error for that specific repository - Continue processing the remaining repositories - Provide a summary at the end showing which repositories succeeded and which failed ### For GitHub Enterprise ```bash ./repo-uploader \ --platform github \ --github-token YOUR_GITHUB_TOKEN \ --product-copilot-token YOUR_PRODUCT_COPILOT_TOKEN \ --repository-path "owner/repo-name" \ --github-url "https://github.enterprise.com" ``` ### Using Environment Variables ```bash # For GitLab export GITLAB_TOKEN="your_gitlab_token_here" export PRODUCT_COPILOT_TOKEN="your_product_copilot_token_here" ./repo-uploader --platform gitlab --repository-path "group/project-name" # For GitHub export GITHUB_TOKEN="your_github_token_here" export PRODUCT_COPILOT_TOKEN="your_product_copilot_token_here" ./repo-uploader --platform github --repository-path "owner/repo-name" # For GitHub Enterprise export GITHUB_TOKEN="your_github_token_here" export GITHUB_BASE_URL="https://github.enterprise.com" export PRODUCT_COPILOT_TOKEN="your_product_copilot_token_here" ./repo-uploader --platform github --repository-path "owner/repo-name" ``` ### For Self-hosted GitLab ```bash ./repo-uploader \ --platform gitlab \ --gitlab-token YOUR_GITLAB_TOKEN \ --product-copilot-token YOUR_PRODUCT_COPILOT_TOKEN \ --repository-path "group/project-name" \ --gitlab-url "https://gitlab.example.com" ``` ### Command Line Options - `--clean`: Delete existing documents in the knowledge bucket before uploading (optional, defaults to false) - `--platform`: Platform to use (gitlab or github) (required) - `--branch`: Branch to use (e.g., 'main' or 'develop') (optional, defaults to default branch) - `--gitlab-url`: GitLab base URL (optional, defaults to gitlab.com) - `--gitlab-token`: GitLab API token (required for GitLab, or set GITLAB_TOKEN env var) - `--github-url`: GitHub base URL (optional, defaults to github.com, use for GitHub Enterprise) - `--github-token`: GitHub API token (required for GitHub, or set GITHUB_TOKEN env var) - `--repository-path`: Repository path(s) - single path (e.g., 'group/project' for GitLab or 'owner/repo' for GitHub) or comma-separated list for multiple repositories (required) - `--product-copilot-token`: Product Copilot API token (required, or set PRODUCT_COPILOT_TOKEN env var) - `--help`: Show help message - `version`: Show version information ## How It Works 1. **Authentication**: Connects to GitLab/GitHub using the provided API token 2. **Repository Access**: Retrieves the specified repository information 3. **File Traversal**: Recursively traverses all files in the repository 4. **File Filtering**: Skips binary files, build artifacts, and common non-useful directories 5. **Knowledge Bucket Clear**: Clears the existing knowledge bucket for the project 6. **File Upload**: Uploads each file to the Product Copilot Knowledge Bucket with metadata ## Platform-Specific Features ### GitLab - Supports both gitlab.com and self-hosted GitLab instances - Uses GitLab API v4 - Handles GitLab-specific file encoding (base64) ### GitHub - Supports both github.com and GitHub Enterprise instances - Uses GitHub API v3 - Handles GitHub's content encoding automatically - Supports public and private repositories (with appropriate token permissions) ## Token Requirements ### GitLab Token - Access level: Reporter or higher - Scopes: `read_api`, `read_repository` ### GitHub Token - Permissions: `Contents: Read` for the target repository - For private repositories, the token must have access to the repository ## File Filtering The tool automatically skips the following types of files and directories: ### Skipped File Extensions - Images: `.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.ico`, `.svg` - Documents: `.pdf`, `.doc`, `.docx`, `.xls`, `.xlsx`, `.ppt`, `.pptx` - Archives: `.zip`, `.tar`, `.gz`, `.rar`, `.7z` - Binaries: `.exe`, `.dll`, `.so`, `.dylib`, `.bin`, `.obj`, `.o`, `.a` - Media: `.mp3`, `.mp4`, `.avi`, `.mov`, `.wmv` - Fonts: `.ttf`, `.otf`, `.woff`, `.woff2` ### Skipped Directories - `.git`, `node_modules`, `vendor`, `.vscode`, `.idea` - `target`, `build`, `dist`, `out`, `bin` ## Knowledge Bucket API The tool uploads files to the Product Copilot Knowledge Bucket using the following API: **Endpoint**: `POST https://doc.product-copilot.ai/api/v2/documents` ### Request Body ```json { "subunitId": "gitlab_project_name", "sourceUri": "gitlab_file_url", "title": "file_name", "lastUpdateTime": "last update timestamp", "content": "byte[] content" } ``` ### Authorization The API requires an `Authorization` header with a bearer token: ``` Authorization: Bearer YOUR_PRODUCT_COPILOT_TOKEN ``` ## Error Handling The application provides detailed error messages for common issues: - Invalid GitLab tokens or repository access issues - Network connectivity problems - Product Copilot API authentication failures - File processing errors (individual files are skipped, processing continues) ## Examples ### Example 1: Upload from gitlab.com repository ```bash ./repo-uploader \ --gitlab-token "glpat-xxxxxxxxxxxxxxxxxxxx" \ --product-copilot-token "pc-xxxxxxxxxxxxxxxxxxxx" \ --repository-path "mycompany/awesome-project" ``` ### Example 2: Upload from self-hosted GitLab ```bash ./repo-uploader \ --gitlab-token "glpat-xxxxxxxxxxxxxxxxxxxx" \ --product-copilot-token "pc-xxxxxxxxxxxxxxxxxxxx" \ --repository-path "engineering/backend-services" \ --gitlab-url "https://gitlab.mycompany.com" ``` ### Example 3: Multiple repositories from GitLab ```bash ./repo-uploader \ --platform gitlab \ --gitlab-token "glpat-xxxxxxxxxxxxxxxxxxxx" \ --product-copilot-token "pc-xxxxxxxxxxxxxxxxxxxx" \ --repository-path "group1/project1,group1/project2,group2/backend-service" ``` ### Example 4: Multiple GitHub repositories ```bash ./repo-uploader \ --platform github \ --github-token "ghp-xxxxxxxxxxxxxxxxxxxx" \ --product-copilot-token "pc-xxxxxxxxxxxxxxxxxxxx" \ --repository-path "myorg/frontend,myorg/backend,myorg/shared-utils" ``` ### Example 5: Using environment variables ```bash export GITLAB_TOKEN="glpat-xxxxxxxxxxxxxxxxxxxx" export PRODUCT_COPILOT_TOKEN="pc-xxxxxxxxxxxxxxxxxxxx" ./repo-uploader --platform gitlab --repository-path "data-team/ml-models" ``` ## Troubleshooting ### Common Issues 1. **"failed to get project" error**: Check that your GitLab token has access to the repository and the repository path is correct 2. **"upload failed with status" error**: Verify your Product Copilot token is valid and has permission to upload documents 3. **Network timeouts**: The tool has a 30-second timeout for API calls; very large files might timeout ### Getting API Tokens **GitLab Token:** 1. Go to GitLab → User Settings → Access Tokens 2. Create a token with `read_repository` scope 3. Use the generated token with the `--gitlab-token` flag **GitHub Token:** 1. Go to GitHub → Settings → Developer settings → Personal access tokens 2. Create a token with `repo` scope for private repositories or `public_repo` for public repositories 3. Use the generated token with the `--github-token` flag **Product Copilot Token:** 1. Contact your Product Copilot administrator 2. Use the provided token with the `--product-copilot-token` flag