Darmowy szablon automatyzacji

Generuj pliki llms.txt gotowe na sztuczną inteligencję z indeksowania witryn Screaming Frog

799
1 mies. temu
23
bloków

Opis workflow automatyzacji generowania pliku llms.txt

Ten szablon automatyzacji pomaga w generowaniu pliku llms.txt na podstawie eksportu z narzędzia Screaming Frog. Screaming Frog to popularne narzędzie do crawlowania stron internetowych.

Jak to działa

Formularz pozwala wprowadzić następujące dane:

  • Nazwę strony internetowej
  • Krótki opis
  • Plik internal_html.csv z eksportu Screaming Frog

Po przesłaniu formularza workflow jest automatycznie uruchamiany, a gotowy plik llms.txt można pobrać bezpośrednio z interfejsu n8n.

Pobieranie pliku

Ostatni węzeł w tym workflow to "Konwertuj do pliku", więc plik należy pobrać bezpośrednio z interfejsu n8n. Można jednak łatwo dodać węzeł (np. Google Drive, OneDrive) do automatycznego przesłania pliku w wybrane miejsce.

Filtrowanie z wykorzystaniem AI (opcjonalne)

Workflow zawiera węzeł klasyfikatora tekstu, który jest domyślnie dezaktywowany. Można go aktywować, aby zastosować inteligentniejsze filtrowanie przy wyborze URL do pliku llms.txt. Warto zmodyfikować opis w węźle klasyfikatora, aby precyzyjniej określić typ URL, które mają być uwzględnione.

Instrukcja użycia

  1. Przecrawluj stronę internetową w Screaming Frog
  2. Wyeksportuj sekcję "internal_html" w formacie CSV
  3. W n8n kliknij "Test Workflow", wypełnij formularz i załaduj plik internal_html.csv
  4. Po zakończeniu workflow przejdź do węzła "Export to File" i pobierz wynik

Przykłady zastosowań

Ta automatyzacja znajduje zastosowanie w wielu scenariuszach związanych z optymalizacją stron internetowych i zarządzaniem treścią:

  • Generowanie plików wykluczeń dla narzędzi SEO
  • Tworzenie mapy strony dla wyszukiwarek
  • Automatyzacja procesu audytu strony
  • Przygotowanie danych do analizy struktury witryny
  • Optymalizacja indeksowania przez roboty wyszukiwarek
  • Tworzenie list URL do testów A/B
  • Generowanie raportów dla klientów z listą podstron

Zalecane użycie

Najwygodniej jest używać tego workflow bezpośrednio w interfejsie n8n, klikając 'Test Workflow' i przesyłając plik przez formularz.

   Skopiuj kod szablonu   
{"id":"","meta":{"instanceId":"","templateCredsSetupCompleted":true},"name":"Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls","tags":[],"nodes":[{"id":"ca701618-b2d5-48ee-a503-d3513d018a65","name":"Sticky Note","type":"n8n-nodes-base.stickyNote","position":[360,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## Form - Screaming Frog internal_html.csv upload nnThis form node is used to trigger the workflow. nnIt contains **three input fields**: n- Name of the website n- Short description of the website n- **Screaming Frog** export containing the internal URLs nnnnIt is recommended to use the **internal_html.csv** export, but **internal_all.csv** will also work, as the workflow includes a filter to process only indexable URLs.n"},"typeVersion":1},{"id":"bc040ca0-f38d-4458-a60c-17f71dbfd1ea","name":"Sticky Note1","type":"n8n-nodes-base.stickyNote","position":[780,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## Extract data from Screaming Frog filennThis node extracts data from the **CSV file** provided by the user. nnIt produces an output that is **easily usable** in the following nodes. nn⚠️ **Caution:** nIf the uploaded file is **not** the expected Screaming Frog export, the workflow will still proceed but will likely **fail in the next steps** due to missing required fields. nn"},"typeVersion":1},{"id":"f71a7d10-847d-48e7-8820-ec0c1e7ea055","name":"Sticky Note2","type":"n8n-nodes-base.stickyNote","position":[1200,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## Set Useful Fields nnThis node sets **7 key fields** from the Screaming Frog export: nn- `url` → from the **"Address"** column n- `title` → from the **"Title 1"** column n- `description` → from the **"Meta Description 1"** column n- `status` → from the **"Status Code"** column n- `indexability` → from the **"Indexability"** column n- `content_type` → from the **"Content Type"** column n- `word_count` → from the **"Word Count"** column nnn**Multi-language compatibility** nIf you're using Screaming Frog in **French, Italian, German, or Spanish**, the column names will be different. nHowever, the workflow is designed to handle this, so it will **still work correctly**! 🥳n"},"typeVersion":1},{"id":"6f6546b8-adeb-4998-ae19-d93525337eb7","name":"Set useful fields","type":"n8n-nodes-base.set","position":[1340,60],"parameters":{"options":{},"assignments":{"assignments":[{"id":"0e7d4a06-83fc-4834-93fe-2e758cbe2307","name":"url","type":"string","value":"={{ $json.Address || $json.Adresse || $json.Dirección || $json.Indirizzo }}"},{"id":"c82f4d4c-9d0b-4c7d-9647-5d0240b58643","name":"title","type":"string","value":"={{ $json['Title 1'] || $json['Titolo 1'] || $json['Titolo 1'] || $json['Título 1'] || $json['Titel 1'] }}"},{"id":"abea81db-ce3b-4ac1-bd21-09ccfffb567a","name":"description","type":"string","value":"={{ $json['Meta Description 1'] || $json['Meta description 1'] }}"},{"id":"2ca75d74-70f8-400b-b862-9da186135915","name":"statut","type":"string","value":"={{ $json['Status Code'] || $json['Code HTTP'] || $json['Status-Code'] || $json['Código de respuesta'] || $json['Codice di stato']}}"},{"id":"754d3202-38b0-4d79-ba24-8078b3244307","name":"indexability","type":"string","value":"={{ $json.Indexability || $json.Indexabilité || $json.Indicizzabilità || $json.Indexabilidad || $json.Indexierbarkeit}}"},{"id":"8bc6583d-bb34-4d22-b310-fe79bb8ac85d","name":"content_type","type":"string","value":"={{ $json['Content Type'] || $json['Type de contenu'] || $json['Tipo di contenuto'] || $json['Tipo de contenido'] || $json['Inhaltstyp']}}"},{"id":"c874ba1a-769e-43d3-9555-8c9914ca9b76","name":"word_count","type":"string","value":"={{ $json['Word Count'] || $json['Nombre de mots'] || $json['Conteggio delle parole'] || $json['Conteggio delle parole'] || $json['Recuento de palabras'] || $json['Wortanzahl'] }}"}]}},"typeVersion":3.4},{"id":"1a9af7a0-d2d5-44cb-9770-2d5a1e5706f4","name":"Text Classifier","type":"@n8n/n8n-nodes-langchain.textClassifier","disabled":true,"position":[2260,60],"parameters":{"options":{},"inputText":"=url : {{ $json.url }}ntitle : {{ $json.title }}ndescription : {{ $json.description }}nwords count : {{ $json.word_count }}","categories":{"categories":[{"category":"useful_content","description":"Pages that are likely to contain high-quality content, making them suitable for inclusion in a file that aids content discovery for an LLM. "},{"category":"other_content","description":"Pages that should not be included (e.g., pagination, or low-value content)."}]}},"typeVersion":1},{"id":"74a4e378-4228-4142-92ca-e541efde2b15","name":"OpenAI Chat Model","type":"@n8n/n8n-nodes-langchain.lmChatOpenAi","position":[2180,240],"parameters":{"model":{"__rl":true,"mode":"list","value":"gpt-4o-mini"},"options":{}},"credentials":{"openAiApi":{"id":"","name":"OpenAi Connection"}},"typeVersion":1.2},{"id":"63dc6cfe-bc73-43b5-8c7d-4f5fd6501d3b","name":"No Operation, do nothing","type":"n8n-nodes-base.noOp","position":[2580,200],"parameters":{},"typeVersion":1},{"id":"cb555b99-9e63-4b6b-a1fc-512b5467d666","name":"Sticky Note3","type":"n8n-nodes-base.stickyNote","position":[1620,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## Filter URLs nnThis **filter node** is used to keep only the URLs that meet the following conditions: n- `status` = **200** n- `indexability` = **indexable** n- `content_type` contains **text/html** nnnThese filters are even **more useful** if the uploaded file is an **internal_all.csv** instead of an **internal_html.csv**. nn### **Tips:** nYou can **add more filters** to refine the URLs included in your `llms.txt` file. nn💡 **Examples:** n- **Filter by word count** → Ensure pages contain **enough text content**. n- **Filter by URL path** → Keep only **specific folders or categories** in the `llms.txt` file. n- **Filter by meta description** → Exclude URLs **without a meta description**, as this field will be used in the `llms.txt` file to describe each piece of content. n"},"typeVersion":1},{"id":"e34e56e2-5cc8-4e50-bfb0-3aa2e5e04abf","name":"Filter URLs","type":"n8n-nodes-base.filter","position":[1740,60],"parameters":{"options":{},"conditions":{"options":{"version":2,"leftValue":"","caseSensitive":true,"typeValidation":"strict"},"combinator":"and","conditions":[{"id":"cef4feaa-1c46-45b1-92b7-f5c2051b1dc5","operator":{"type":"number","operation":"equals"},"leftValue":"={{ Number($json.statut) }}","rightValue":200},{"id":"bb821656-9740-4da4-8aa9-f65ad098c470","operator":{"type":"boolean","operation":"true","singleValue":true},"leftValue":"={{ ["Indexable", "Indicizzabile", "Indexierbar"].includes($json.indexability) }}","rightValue":"={{ "Indexable" || "Indicizzabile" }}"},{"id":"5c93ddb8-8091-406a-bc04-fa14e8b73fb9","operator":{"type":"string","operation":"contains"},"leftValue":"={{ $json.content_type }}","rightValue":"text/html"}]}},"typeVersion":2.2},{"id":"b98f19a8-afd3-4d26-8063-dee3ee75055f","name":"Sticky Note4","type":"n8n-nodes-base.stickyNote","position":[2040,-800],"parameters":{"color":2,"width":740,"height":1160,"content":"## Text Classifiernn🚫 **This node is deactivated by default** in the template. nnYou can **enable it** if you want to add a more **"intelligent" 🤓 filter** to refine the URLs included in the `llms.txt` file, helping LLMs discover and prioritize valuable content.nn### How It Works:nThis node has **two outputs**: n- **`useful_content`** → Pages that are **likely to contain high-quality content**, making them suitable for inclusion in a file that **aids content discovery for an LLM**. n- **`other_content`** → Pages that should **not** be included (e.g., pagination or low-value content). nnnYou can **modify the description** in the node to fine-tune the classification according to your needs. nn### Input Fields:n- **url** → `{{ $json.url }}` n- **title** → `{{ $json.title }}` n- **description** → `{{ $json.description }}` n- **word_count** → `{{ $json.word_count }}` nn### Why use an LLM? nA **language model (LLM)** can **analyze** the **URL, title, and description** to identify pages that **most likely contain meaningful and relevant content**. nThis allows it to **prioritize valuable pages** and structure the data for **better content discovery and training purposes**. nn### **For large websites** nIf you have a **very large website**, consider using a **Loop Over Items** node to make the workflow **more robust** and ensure all pages are processed. nAlso, using a **Loop Over Items** node make it **easier** to handle: n- **Timeouts** n- **API quotas** n- **Other scalability issues**nn### Tokens usagenFinally, keep in mind that **more pages mean more tokens and more billed LLM API calls**.nnnnnnnn"},"typeVersion":1},{"id":"63e3ea7a-cec3-442c-9812-771def0a9949","name":"Sticky Note5","type":"n8n-nodes-base.stickyNote","position":[2840,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## Set Field - llms.txt RownnThis node **sets** the row format for the `llms.txt` file. nn### Row Structure:nEach row follows this format: nn- `- [title](link): description` nnIf the URL **has no description** (from the **Meta Description** in the Screaming Frog export), the row will be: nn- `- [title](link)` n"},"typeVersion":1},{"id":"78f58220-feb5-4044-b994-39a0e4f1e9e4","name":"Sticky Note6","type":"n8n-nodes-base.stickyNote","position":[3260,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## Summarize - ConcatenatennThis node concatenates all the output from the previous node, ensuring each row is on a separate line."},"typeVersion":1},{"id":"7a119633-7cd3-4de5-a1cd-7f708e1abf4a","name":"Sticky Note7","type":"n8n-nodes-base.stickyNote","position":[3680,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## Set Fields - llms.txt ContentnnThis node sets the content of the `llms.txt` file using:nn- The **website title** provided in the form (first node).n- The **website description** provided in the form (first node).n- The output from the previous node, which includes all the URLs, their titles, and their descriptions that will appear in the `llms.txt` file.n"},"typeVersion":1},{"id":"554f6858-68e8-4b35-a6c4-21bed6832323","name":"Sticky Note8","type":"n8n-nodes-base.stickyNote","position":[4100,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## Generate llms.txt filennThis node **creates** the `llms.txt` file, which can be **downloaded directly** within n8n. n"},"typeVersion":1},{"id":"24bdefba-e2f2-41f0-93e7-9f8d2fc11f43","name":"Sticky Note9","type":"n8n-nodes-base.stickyNote","position":[4520,-500],"parameters":{"color":7,"width":360,"height":860,"content":"## upload file anywherennInstead of downloading the file directly from the n8n workflow, you can **replace this node node** with a Drive node (e.g., **Google Drive** or **OneDrive**) to upload the `llms.txt` file to a folder of your choice. n n**Name the file properly** (e.g., include the website name) to make it easier to find and distinguish between files when working on multiple websites. n"},"typeVersion":1},{"id":"a3be51e3-810c-40a7-a996-98a3d383c2b9","name":"Summarize - Concatenate","type":"n8n-nodes-base.summarize","position":[3380,40],"parameters":{"options":{},"fieldsToSummarize":{"values":[{"field":"llmTxtRow","separateBy":"n","aggregation":"concatenate"}]}},"typeVersion":1.1},{"id":"8d3a892a-3d11-4d8a-8ec6-84f8f3af1183","name":"Set Fields - llms.txt Content","type":"n8n-nodes-base.set","position":[3820,40],"parameters":{"options":{},"assignments":{"assignments":[{"id":"97062a99-e944-4e1e-89b1-62cf9e3462dd","name":"llmsTxtFile","type":"string","value":"=# {{ $('Form - Screaming frog internal_html.csv upload').item.json['What is the name of your website?'] }}n> {{ $('Form - Screaming frog internal_html.csv upload').item.json['Can you provide a short description of your website? (in the language of the website)'] }}nn{{ $json.concatenated_llmTxtRow }}"}]}},"typeVersion":3.4},{"id":"bc2a692a-47ea-4bf1-a102-e607fd544158","name":"upload file anywhere","type":"n8n-nodes-base.noOp","position":[4640,40],"parameters":{},"typeVersion":1},{"id":"404510a2-35b2-44cf-9d02-eb0abcf4e9b3","name":"Set Field - llms.txt Row","type":"n8n-nodes-base.set","position":[2960,40],"parameters":{"options":{},"assignments":{"assignments":[{"id":"95e75caa-8110-476b-9cb1-73c15361fa56","name":"llmTxtRow","type":"string","value":"=- [{{ $json.title }}]({{ $json.url }}){{ $json.description ? ': ' + $json.description : '' }}"}]}},"typeVersion":3.4},{"id":"f54d51f2-17bc-4c58-b177-0e77e16f7b72","name":"Sticky Note10","type":"n8n-nodes-base.stickyNote","position":[-420,-1020],"parameters":{"color":5,"width":700,"height":1380,"content":"# Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls nnThis workflow helps you generate an **llms.txt** file (if you're unfamiliar with it, check out [this article](https://towardsdatascience.com/llms-txt-414d5121bcb3/)) using a **Screaming Frog export**. nn[Screaming Frog](https://www.screamingfrog.co.uk/seo-spider/) is a well-known website crawler. nYou can easily crawl a website. Then, export the **"internal_html"** section in CSV format. nn## How It Works: nnA **form** allows you to enter: n- The **name of the website** n- A **short description** n- The **internal_html.csv** file from your Screaming Frog export nnnOnce the form is submitted, the **workflow is triggered automatically**, and you can **download the llms.txt file directly from n8n**. nn## Downloading the FilenSince the last node in this workflow is **"Convert to File"**, you will need to **download the file directly from the n8n UI**. nHowever, you can easily **add a node** (e.g., Google Drive, OneDrive) to automatically upload the file **wherever you want**. nn## AI-Powered Filtering (Optional): nThis workflow includes a **text classifier node**, which is **deactivated by default**. n- You can **activate it** to apply a more **intelligent filter** to select URLs for the `llms.txt` file. n- Consider modifying the **description** in the classifier node to specify the type of URLs you want to include. nn## How to Use This Workflow nn1. **Crawl the website** you want to generate an `llms.txt` file for using **Screaming Frog**. n2. **Export the "internal_html"** section in CSV format. n ![Screaming Frog internal html export](https://i.imgur.com/M0nJQiV.png) n3. In **n8n**, click **"Test Workflow"**, fill in the form, and **upload** the `internal_html.csv` file. n4. Once the workflow is complete, go to the **"Export to File"** node and **download the output**. nn**That's it! You now have your llms.txt file!** nnnn**Recommended Usage:** nUse this workflow **directly in the n8n UI by clicking** 'Test Workflow' and uploading the file in the form."},"typeVersion":1},{"id":"e33104af-802a-43f2-b26d-f368f7de2fd7","name":"Form - Screaming frog internal_html.csv upload","type":"n8n-nodes-base.formTrigger","position":[460,60],"webhookId":"8791f39a-3d81-405c-b177-0a733ebf74cb","parameters":{"options":{"buttonLabel":"Get the llms.txt file"},"formTitle":"llms.txt Generator - From Screaming Frog export","formFields":{"values":[{"fieldLabel":"What is the name of your website?","placeholder":"Example : The best website ever","requiredField":true},{"fieldLabel":"Can you provide a short description of your website? (in the language of the website)","placeholder":"Example : This is the best website ever because all the content is engaging and valuable.","requiredField":true},{"fieldType":"file","fieldLabel":"screaming_frog_export","multipleFiles":false,"requiredField":true,"acceptFileTypes":".csv"}]},"responseMode":"lastNode","formDescription":"Generate a simple llms.txt file from a Screaming Frog ExportnIt is recommended to use the internal_html.csv export, although internal_all.csv will also work.nnFill in the fields in this form.Just fill in the fields in this form 😄"},"typeVersion":2.2},{"id":"f6b17fdd-a098-411e-8d53-3f6e638cc3ba","name":"Extract data from Screaming Frog file","type":"n8n-nodes-base.extractFromFile","position":[900,60],"parameters":{"options":{},"operation":"xls","binaryPropertyName":"screaming_frog_export"},"typeVersion":1},{"id":"6bbd8d1f-3322-4c6d-af08-c842386239ce","name":"Generate llms.txt file","type":"n8n-nodes-base.convertToFile","position":[4220,40],"parameters":{"options":{"encoding":"utf8","fileName":"llms.txt"},"operation":"toText","sourceProperty":"llmsTxtFile"},"typeVersion":1.1}],"active":false,"pinData":{},"settings":{"executionOrder":"v1"},"versionId":"","connections":{"Filter URLs":{"main":[[{"node":"Text Classifier","type":"main","index":0}]]},"Text Classifier":{"main":[[{"node":"Set Field - llms.txt Row","type":"main","index":0}],[{"node":"No Operation, do nothing","type":"main","index":0}]]},"OpenAI Chat Model":{"ai_languageModel":[[{"node":"Text Classifier","type":"ai_languageModel","index":0}]]},"Set useful fields":{"main":[[{"node":"Filter URLs","type":"main","index":0}]]},"Generate llms.txt file":{"main":[[]]},"Summarize - Concatenate":{"main":[[{"node":"Set Fields - llms.txt Content","type":"main","index":0}]]},"Set Field - llms.txt Row":{"main":[[{"node":"Summarize - Concatenate","type":"main","index":0}]]},"Set Fields - llms.txt Content":{"main":[[{"node":"Generate llms.txt file","type":"main","index":0}]]},"Extract data from Screaming Frog file":{"main":[[{"node":"Set useful fields","type":"main","index":0}]]},"Form - Screaming frog internal_html.csv upload":{"main":[[{"node":"Extract data from Screaming Frog file","type":"main","index":0}]]}}}
    Planeta AI 2025 
    magic-wandmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram