Carregar ficheiros para "/"

2024-11-26 06:31:52 -08:00
commit 76669c0ed3
4 changed files with 2408 additions and 0 deletions
--- a/HuggingFace.txt
+++ b/HuggingFace.txt
@ -0,0 +1,111 @@
+Download files from the Hub
+The huggingface_hub library provides functions to download files from the repositories stored on the Hub. You can use these functions independently or integrate them into your own library, making it more convenient for your users to interact with the Hub. This guide will show you how to:
+
+Download and cache a single file.
+Download and cache an entire repository.
+Download files to a local folder.
+Download a single file
+The hf_hub_download() function is the main function for downloading files from the Hub. It downloads the remote file, caches it on disk (in a version-aware way), and returns its local file path.
+
+The returned filepath is a pointer to the HF local cache. Therefore, it is important to not modify the file to avoid having a corrupted cache. If you are interested in getting to know more about how files are cached, please refer to our caching guide.
+
+From latest version
+Select the file to download using the repo_id, repo_type and filename parameters. By default, the file will be considered as being part of a model repo.
+
+Copied
+from huggingface_hub import hf_hub_download
+hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json")
+'/root/.cache/huggingface/hub/models--lysandre--arxiv-nlp/snapshots/894a9adde21d9a3e3843e6d5aeaaf01875c7fade/config.json'
+
+# Download from a dataset
+hf_hub_download(repo_id="google/fleurs", filename="fleurs.py", repo_type="dataset")
+'/root/.cache/huggingface/hub/datasets--google--fleurs/snapshots/199e4ae37915137c555b1765c01477c216287d34/fleurs.py'
+From specific version
+By default, the latest version from the main branch is downloaded. However, in some cases you want to download a file at a particular version (e.g. from a specific branch, a PR, a tag or a commit hash). To do so, use the revision parameter:
+
+Copied
+# Download from the `v1.0` tag
+hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="v1.0")
+
+# Download from the `test-branch` branch
+hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="test-branch")
+
+# Download from Pull Request #3
+hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="refs/pr/3")
+
+# Download from a specific commit hash
+hf_hub_download(repo_id="lysandre/arxiv-nlp", filename="config.json", revision="877b84a8f93f2d619faa2a6e514a32beef88ab0a")
+Note: When using the commit hash, it must be the full-length hash instead of a 7-character commit hash.
+
+Construct a download URL
+In case you want to construct the URL used to download a file from a repo, you can use hf_hub_url() which returns a URL. Note that it is used internally by hf_hub_download().
+
+Download an entire repository
+snapshot_download() downloads an entire repository at a given revision. It uses internally hf_hub_download() which means all downloaded files are also cached on your local disk. Downloads are made concurrently to speed-up the process.
+
+To download a whole repository, just pass the repo_id and repo_type:
+
+Copied
+from huggingface_hub import snapshot_download
+snapshot_download(repo_id="lysandre/arxiv-nlp")
+'/home/lysandre/.cache/huggingface/hub/models--lysandre--arxiv-nlp/snapshots/894a9adde21d9a3e3843e6d5aeaaf01875c7fade'
+
+# Or from a dataset
+snapshot_download(repo_id="google/fleurs", repo_type="dataset")
+'/home/lysandre/.cache/huggingface/hub/datasets--google--fleurs/snapshots/199e4ae37915137c555b1765c01477c216287d34'
+snapshot_download() downloads the latest revision by default. If you want a specific repository revision, use the revision parameter:
+
+Copied
+from huggingface_hub import snapshot_download
+snapshot_download(repo_id="lysandre/arxiv-nlp", revision="refs/pr/1")
+Filter files to download
+snapshot_download() provides an easy way to download a repository. However, you don’t always want to download the entire content of a repository. For example, you might want to prevent downloading all .bin files if you know you’ll only use the .safetensors weights. You can do that using allow_patterns and ignore_patterns parameters.
+
+These parameters accept either a single pattern or a list of patterns. Patterns are Standard Wildcards (globbing patterns) as documented here. The pattern matching is based on fnmatch.
+
+For example, you can use allow_patterns to only download JSON configuration files:
+
+Copied
+from huggingface_hub import snapshot_download
+snapshot_download(repo_id="lysandre/arxiv-nlp", allow_patterns="*.json")
+On the other hand, ignore_patterns can exclude certain files from being downloaded. The following example ignores the .msgpack and .h5 file extensions:
+
+Copied
+from huggingface_hub import snapshot_download
+snapshot_download(repo_id="lysandre/arxiv-nlp", ignore_patterns=["*.msgpack", "*.h5"])
+Finally, you can combine both to precisely filter your download. Here is an example to download all json and markdown files except vocab.json.
+
+Copied
+from huggingface_hub import snapshot_download
+snapshot_download(repo_id="gpt2", allow_patterns=["*.md", "*.json"], ignore_patterns="vocab.json")
+Download file(s) to a local folder
+By default, we recommend using the cache system to download files from the Hub. You can specify a custom cache location using the cache_dir parameter in hf_hub_download() and snapshot_download(), or by setting the HF_HOME environment variable.
+
+However, if you need to download files to a specific folder, you can pass a local_dir parameter to the download function. This is useful to get a workflow closer to what the git command offers. The downloaded files will maintain their original file structure within the specified folder. For example, if filename="data/train.csv" and local_dir="path/to/folder", the resulting filepath will be "path/to/folder/data/train.csv".
+
+A .cache/huggingface/ folder is created at the root of your local directory containing metadata about the downloaded files. This prevents re-downloading files if they’re already up-to-date. If the metadata has changed, then the new file version is downloaded. This makes the local_dir optimized for pulling only the latest changes.
+
+After completing the download, you can safely remove the .cache/huggingface/ folder if you no longer need it. However, be aware that re-running your script without this folder may result in longer recovery times, as metadata will be lost. Rest assured that your local data will remain intact and unaffected.
+
+Don’t worry about the .cache/huggingface/ folder when committing changes to the Hub! This folder is automatically ignored by both git and upload_folder().
+
+Download from the CLI
+You can use the huggingface-cli download command from the terminal to directly download files from the Hub. Internally, it uses the same hf_hub_download() and snapshot_download() helpers described above and prints the returned path to the terminal.
+
+Copied
+>>> huggingface-cli download gpt2 config.json
+/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json
+You can download multiple files at once which displays a progress bar and returns the snapshot path in which the files are located:
+
+Copied
+>>> huggingface-cli download gpt2 config.json model.safetensors
+Fetching 2 files: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 23831.27it/s]
+/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10
+For more details about the CLI download command, please refer to the CLI guide.
+
+Faster downloads
+If you are running on a machine with high bandwidth, you can increase your download speed with hf_transfer, a Rust-based library developed to speed up file transfers with the Hub. To enable it:
+
+Specify the hf_transfer extra when installing huggingface_hub (e.g. pip install huggingface_hub[hf_transfer]).
+Set HF_HUB_ENABLE_HF_TRANSFER=1 as an environment variable.
+hf_transfer is a power user tool! It is tested and production-ready, but it lacks user-friendly features like advanced error handling or proxies. For more details, please take a look at this section.
--- a/Transformer.txt
+++ b/Transformer.txt
@ -0,0 +1,160 @@
+Installation
+Install 🤗 Transformers for whichever deep learning library you’re working with, setup your cache, and optionally configure 🤗 Transformers to run offline.
+
+🤗 Transformers is tested on Python 3.6+, PyTorch 1.1.0+, TensorFlow 2.0+, and Flax. Follow the installation instructions below for the deep learning library you are using:
+
+PyTorch installation instructions.
+TensorFlow 2.0 installation instructions.
+Flax installation instructions.
+Install with pip
+You should install 🤗 Transformers in a virtual environment. If you’re unfamiliar with Python virtual environments, take a look at this guide. A virtual environment makes it easier to manage different projects, and avoid compatibility issues between dependencies.
+
+Start by creating a virtual environment in your project directory:
+
+Copied
+python -m venv .env
+Activate the virtual environment. On Linux and MacOs:
+
+Copied
+source .env/bin/activate
+Activate Virtual environment on Windows
+
+Copied
+.env/Scripts/activate
+Now you’re ready to install 🤗 Transformers with the following command:
+
+Copied
+pip install transformers
+For CPU-support only, you can conveniently install 🤗 Transformers and a deep learning library in one line. For example, install 🤗 Transformers and PyTorch with:
+
+Copied
+pip install 'transformers[torch]'
+🤗 Transformers and TensorFlow 2.0:
+
+Copied
+pip install 'transformers[tf-cpu]'
+M1 / ARM Users
+
+You will need to install the following before installing TensorFlow 2.0
+
+Copied
+brew install cmake
+brew install pkg-config
+🤗 Transformers and Flax:
+
+Copied
+pip install 'transformers[flax]'
+Finally, check if 🤗 Transformers has been properly installed by running the following command. It will download a pretrained model:
+
+Copied
+python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
+Then print out the label and score:
+
+Copied
+[{'label': 'POSITIVE', 'score': 0.9998704791069031}]
+Install from source
+Install 🤗 Transformers from source with the following command:
+
+Copied
+pip install git+https://github.com/huggingface/transformers
+This command installs the bleeding edge main version rather than the latest stable version. The main version is useful for staying up-to-date with the latest developments. For instance, if a bug has been fixed since the last official release but a new release hasn’t been rolled out yet. However, this means the main version may not always be stable. We strive to keep the main version operational, and most issues are usually resolved within a few hours or a day. If you run into a problem, please open an Issue so we can fix it even sooner!
+
+Check if 🤗 Transformers has been properly installed by running the following command:
+
+Copied
+python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I love you'))"
+Editable install
+You will need an editable install if you’d like to:
+
+Use the main version of the source code.
+Contribute to 🤗 Transformers and need to test changes in the code.
+Clone the repository and install 🤗 Transformers with the following commands:
+
+Copied
+git clone https://github.com/huggingface/transformers.git
+cd transformers
+pip install -e .
+These commands will link the folder you cloned the repository to and your Python library paths. Python will now look inside the folder you cloned to in addition to the normal library paths. For example, if your Python packages are typically installed in ~/anaconda3/envs/main/lib/python3.7/site-packages/, Python will also search the folder you cloned to: ~/transformers/.
+
+You must keep the transformers folder if you want to keep using the library.
+
+Now you can easily update your clone to the latest version of 🤗 Transformers with the following command:
+
+Copied
+cd ~/transformers/
+git pull
+Your Python environment will find the main version of 🤗 Transformers on the next run.
+
+Install with conda
+Install from the conda channel conda-forge:
+
+Copied
+conda install conda-forge::transformers
+Cache setup
+Pretrained models are downloaded and locally cached at: ~/.cache/huggingface/hub. This is the default directory given by the shell environment variable TRANSFORMERS_CACHE. On Windows, the default directory is given by C:\Users\username\.cache\huggingface\hub. You can change the shell environment variables shown below - in order of priority - to specify a different cache directory:
+
+Shell environment variable (default): HUGGINGFACE_HUB_CACHE or TRANSFORMERS_CACHE.
+Shell environment variable: HF_HOME.
+Shell environment variable: XDG_CACHE_HOME + /huggingface.
+🤗 Transformers will use the shell environment variables PYTORCH_TRANSFORMERS_CACHE or PYTORCH_PRETRAINED_BERT_CACHE if you are coming from an earlier iteration of this library and have set those environment variables, unless you specify the shell environment variable TRANSFORMERS_CACHE.
+
+Offline mode
+Run 🤗 Transformers in a firewalled or offline environment with locally cached files by setting the environment variable HF_HUB_OFFLINE=1.
+
+Add 🤗 Datasets to your offline training workflow with the environment variable HF_DATASETS_OFFLINE=1.
+
+Copied
+HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
+python examples/pytorch/translation/run_translation.py --model_name_or_path google-t5/t5-small --dataset_name wmt16 --dataset_config ro-en ...
+This script should run without hanging or waiting to timeout because it won’t attempt to download the model from the Hub.
+
+You can also bypass loading a model from the Hub from each from_pretrained() call with the local_files_only parameter. When set to True, only local files are loaded:
+
+Copied
+from transformers import T5Model
+
+model = T5Model.from_pretrained("./path/to/local/directory", local_files_only=True)
+Fetch models and tokenizers to use offline
+Another option for using 🤗 Transformers offline is to download the files ahead of time, and then point to their local path when you need to use them offline. There are three ways to do this:
+
+Download a file through the user interface on the Model Hub by clicking on the ↓ icon.
+
+download-icon
+
+Use the PreTrainedModel.from_pretrained() and PreTrainedModel.save_pretrained() workflow:
+
+Download your files ahead of time with PreTrainedModel.from_pretrained():
+
+Copied
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+
+tokenizer = AutoTokenizer.from_pretrained("bigscience/T0_3B")
+model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0_3B")
+Save your files to a specified directory with PreTrainedModel.save_pretrained():
+
+Copied
+tokenizer.save_pretrained("./your/path/bigscience_t0")
+model.save_pretrained("./your/path/bigscience_t0")
+Now when you’re offline, reload your files with PreTrainedModel.from_pretrained() from the specified directory:
+
+Copied
+tokenizer = AutoTokenizer.from_pretrained("./your/path/bigscience_t0")
+model = AutoModel.from_pretrained("./your/path/bigscience_t0")
+Programmatically download files with the huggingface_hub library:
+
+Install the huggingface_hub library in your virtual environment:
+
+Copied
+python -m pip install huggingface_hub
+Use the hf_hub_download function to download a file to a specific path. For example, the following command downloads the config.json file from the T0 model to your desired path:
+
+Copied
+from huggingface_hub import hf_hub_download
+
+hf_hub_download(repo_id="bigscience/T0_3B", filename="config.json", cache_dir="./your/path/bigscience_t0")
+Once your file is downloaded and locally cached, specify it’s local path to load and use it:
+
+Copied
+from transformers import AutoConfig
+
+config = AutoConfig.from_pretrained("./your/path/bigscience_t0/config.json")
--- a/Readme.txt
+++ b/Readme.txt
--- a/main.py
+++ b/main.py
@ -0,0 +1,483 @@
+import tkinter as tk
+from tkinter import ttk, filedialog, scrolledtext
+from tkinter import messagebox
+import torch
+from transformers import AutoProcessor, WhisperForConditionalGeneration
+import cv2
+from datetime import timedelta
+import os
+import threading
+import subprocess
+import time
+import re
+import numpy as np
+
+class VideoSubtitleApp:
+    def __init__(self, root):
+        self.root = root
+        self.root.title("Extrator de Legendas")
+        self.root.geometry("900x700")
+        
+        # Variáveis
+        self.video_path = tk.StringVar()
+        self.video_info = tk.StringVar()
+        self.selected_language = tk.StringVar(value='pt-BR')
+        self.subtitles_list = []
+        
+        # Inicializar modelo Whisper e processador
+        self.initialize_whisper()
+        
+        # Dicionário de línguas disponíveis
+        self.languages = {
+            'Português (Brasil)': 'pt',
+            'Português (Portugal)': 'pt',
+            'English': 'en',
+            'Español': 'es',
+            'Français': 'fr',
+            'Deutsch': 'de',
+            'Italiano': 'it'
+        }
+        
+        # Criar interface
+        self.create_widgets()
+        
+        # Variável para armazenar o vídeo
+        self.video = None
+        
+    def create_widgets(self):
+        # Frame principal
+        main_frame = ttk.Frame(self.root, padding="10")
+        main_frame.grid(row=0, column=0, sticky=(tk.W, tk.E, tk.N, tk.S))
+        
+        # Configurar expansão da grade
+        self.root.grid_rowconfigure(0, weight=1)
+        self.root.grid_columnconfigure(0, weight=1)
+        main_frame.grid_columnconfigure(1, weight=1)
+        
+        # Frame para seleção de arquivo e idioma
+        file_frame = ttk.Frame(main_frame)
+        file_frame.grid(row=0, column=0, columnspan=2, sticky=(tk.W, tk.E), pady=5)
+        
+        # Botão para selecionar arquivo
+        ttk.Button(file_frame, text="Selecionar Vídeo", command=self.select_file).pack(side=tk.LEFT, padx=5)
+        
+        # Seleção de idioma
+        ttk.Label(file_frame, text="Idioma:").pack(side=tk.LEFT, padx=5)
+        language_combo = ttk.Combobox(file_frame, 
+                                    values=list(self.languages.keys()),
+                                    textvariable=self.selected_language,
+                                    state='readonly',
+                                    width=20)
+        language_combo.pack(side=tk.LEFT, padx=5)
+        language_combo.set('Português (Brasil)')
+        
+        # Label para mostrar caminho do arquivo
+        ttk.Label(main_frame, textvariable=self.video_path, wraplength=500).grid(row=1, column=0, columnspan=2, pady=5)
+        
+        # Frame para informações do vídeo
+        info_frame = ttk.LabelFrame(main_frame, text="Informações do Vídeo", padding="5")
+        info_frame.grid(row=2, column=0, columnspan=2, sticky=(tk.W, tk.E), pady=5)
+        
+        ttk.Label(info_frame, textvariable=self.video_info).grid(row=0, column=0, sticky=tk.W)
+        
+        # Frame para botões de ação
+        button_frame = ttk.Frame(main_frame)
+        button_frame.grid(row=3, column=0, columnspan=2, pady=5)
+        
+        ttk.Button(button_frame, text="Gerar Legendas", command=self.generate_subtitles).pack(side=tk.LEFT, padx=5)
+        ttk.Button(button_frame, text="Salvar Alterações", command=self.save_subtitles).pack(side=tk.LEFT, padx=5)
+        
+        # Progress bar
+        self.progress = ttk.Progressbar(main_frame, mode='indeterminate')
+        self.progress.grid(row=4, column=0, columnspan=2, sticky=(tk.W, tk.E), pady=5)
+        
+        # Frame para edição de legendas
+        subtitle_frame = ttk.LabelFrame(main_frame, text="Editor de Legendas", padding="5")
+        subtitle_frame.grid(row=5, column=0, columnspan=2, sticky=(tk.W, tk.E, tk.N, tk.S), pady=5)
+        subtitle_frame.grid_rowconfigure(0, weight=1)
+        subtitle_frame.grid_columnconfigure(0, weight=1)
+        
+        # Área de texto editável para legendas
+        self.subtitle_text = scrolledtext.ScrolledText(subtitle_frame, height=20, width=80, wrap=tk.WORD)
+        self.subtitle_text.grid(row=0, column=0, sticky=(tk.W, tk.E, tk.N, tk.S), padx=5, pady=5)
+        
+        # Instruções de uso
+        instructions = """Instruções:
+        1. Selecione o idioma do áudio do vídeo
+        2. Clique em 'Selecionar Vídeo' e escolha o arquivo
+        3. Aguarde o processamento do modelo Whisper
+        4. Edite as legendas se necessário
+        5. Clique em 'Salvar Alterações' para gerar o arquivo .srt"""
+        
+        ttk.Label(main_frame, text=instructions, justify=tk.LEFT, wraplength=600).grid(
+            row=6, column=0, columnspan=2, pady=5, sticky=tk.W)
+
+    def select_file(self):
+        filetypes = (
+            ('Arquivos de vídeo', '*.mp4 *.avi *.mkv'),
+            ('Todos os arquivos', '*.*')
+        )
+        
+        filename = filedialog.askopenfilename(
+            title='Selecione um vídeo',
+            filetypes=filetypes
+        )
+        
+        if filename:
+            self.video_path.set(filename)
+            self.load_video_info(filename)
+
+    def load_video_info(self, filename):
+        try:
+            self.video = cv2.VideoCapture(filename)
+            
+            # Obter informações do vídeo
+            fps = self.video.get(cv2.CAP_PROP_FPS)
+            frame_count = int(self.video.get(cv2.CAP_PROP_FRAME_COUNT))
+            duration = frame_count / fps
+            width = int(self.video.get(cv2.CAP_PROP_FRAME_WIDTH))
+            height = int(self.video.get(cv2.CAP_PROP_FRAME_HEIGHT))
+            
+            info = f"""
+            Duração: {str(timedelta(seconds=int(duration)))}
+            Resolução: {width}x{height}
+            FPS: {fps:.2f}
+            Formato: {os.path.splitext(filename)[1]}
+            """
+            self.video_info.set(info)
+            
+        except Exception as e:
+            messagebox.showerror("Erro", f"Erro ao carregar o vídeo: {str(e)}")
+
+    def generate_subtitles(self):
+        if not self.video_path.get():
+            messagebox.showwarning("Aviso", "Por favor, selecione um vídeo primeiro.")
+            return
+        
+        # Iniciar processamento em thread separada
+        self.progress.start()
+        thread = threading.Thread(target=self.process_video)
+        thread.start()
+    
+    def initialize_whisper(self):
+        """Inicializa o modelo Whisper e o processador com configurações otimizadas"""
+        try:
+            # Usar o modelo maior para melhor qualidade
+            model_name = "openai/whisper-large-v3"
+            self.processor = AutoProcessor.from_pretrained(model_name)
+            self.model = WhisperForConditionalGeneration.from_pretrained(
+                model_name,
+                device_map="auto",  # Usar a melhor dispositivo disponível
+                torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
+                low_cpu_mem_usage=True
+            )
+            
+            if torch.cuda.is_available():
+                print("Usando GPU para processamento")
+            else:
+                print("Usando CPU para processamento")
+                
+        except Exception as e:
+            messagebox.showerror("Erro", f"Erro ao carregar modelo Whisper: {str(e)}")
+
+    def extract_audio(self, video_path, audio_path):
+        """Extrai o áudio do vídeo com configurações otimizadas"""
+        try:
+            print(f"Extraindo áudio de {video_path}")
+            
+            # Primeiro comando - qualidade máxima
+            command = [
+                'ffmpeg',
+                '-i', video_path,
+                '-vn',  # Não processar vídeo
+                '-acodec', 'pcm_s16le',  # Codec PCM 16-bit
+                '-ac', '1',  # Mono
+                '-ar', '16000',  # Taxa de amostragem para Whisper
+                '-af', 'volume=2.0,highpass=f=200,lowpass=f=3000,areverse,silenceremove=start_periods=1:start_duration=1:start_threshold=-60dB,areverse',  # Filtros de áudio
+                '-y',  # Sobrescrever arquivo
+                audio_path
+            ]
+            
+            print("Tentando primeira extração de áudio...")
+            process = subprocess.run(
+                command,
+                capture_output=True,
+                text=True,
+                encoding='utf-8'
+            )
+            
+            if process.returncode != 0:
+                print("Primeira tentativa falhou, tentando método alternativo...")
+                # Comando alternativo - mais simples
+                alt_command = [
+                    'ffmpeg',
+                    '-i', video_path,
+                    '-vn',
+                    '-acodec', 'pcm_s16le',
+                    '-ac', '1',
+                    '-ar', '16000',
+                    '-y',
+                    audio_path
+                ]
+                process = subprocess.run(
+                    alt_command,
+                    capture_output=True,
+                    text=True,
+                    encoding='utf-8'
+                )
+            
+            if os.path.exists(audio_path) and os.path.getsize(audio_path) > 0:
+                print(f"Áudio extraído com sucesso: {os.path.getsize(audio_path)} bytes")
+                return True
+            else:
+                raise Exception("Arquivo de áudio não foi criado ou está vazio")
+                
+        except Exception as e:
+            print(f"Erro detalhado na extração de áudio: {str(e)}")
+            if process and process.stderr:
+                print(f"Erro FFmpeg: {process.stderr}")
+            return False
+
+    def process_audio_with_whisper(self, audio_path, language_code):
+        try:
+            import soundfile as sf
+            print(f"Processando áudio em {language_code}...")
+            
+            # Carregar áudio
+            audio, sample_rate = sf.read(audio_path)
+            print(f"Áudio carregado: {len(audio)} amostras, taxa de amostragem: {sample_rate}Hz")
+            
+            # Normalizar áudio
+            if audio.dtype == np.int16:
+                audio = audio.astype(np.float32) / 32768.0
+            elif audio.dtype == np.int32:
+                audio = audio.astype(np.float32) / 2147483648.0
+            
+            # Garantir que o áudio esteja entre -1 e 1
+            max_abs = np.max(np.abs(audio))
+            if max_abs > 1.0:
+                audio = audio / max_abs
+            
+            # Preparar input features com configurações explícitas
+            inputs = self.processor(
+                audio, 
+                sampling_rate=sample_rate,
+                return_tensors="pt",
+                padding=True,
+                do_normalize=True,
+                return_attention_mask=True
+            )
+            
+            print("Features de entrada processadas")
+            
+            # Mover para GPU se disponível
+            if torch.cuda.is_available():
+                inputs = {k: v.to("cuda") for k, v in inputs.items()}
+                print("Dados movidos para GPU")
+            
+            # Configurar parâmetros de geração corrigidos
+            generate_kwargs = {
+                "temperature": 0.0,  # Determinístico
+                "no_speech_threshold": 0.6,
+                "logprob_threshold": -1.0,
+                "compression_ratio_threshold": 2.4,
+                "condition_on_previous_text": True,
+                "max_initial_timestamp": 1.0,
+                "return_timestamps": True
+            }
+            
+            if language_code:
+                generate_kwargs["language"] = language_code
+            
+            print("Iniciando geração da transcrição...")
+            
+            # Gerar transcrição com timestamps
+            with torch.no_grad():
+                outputs = self.model.generate(
+                    inputs.input_features,
+                    **generate_kwargs
+                )
+            
+            print("Transcrição gerada, decodificando...")
+            
+            # Decodificar saída com timestamp_begin=True
+            transcription = self.processor.batch_decode(
+                outputs, 
+                skip_special_tokens=True,
+                output_offsets=True
+            )[0]
+            
+            print(f"Transcrição decodificada: {len(transcription.text)} caracteres")
+            
+            if not transcription.text.strip():
+                raise Exception("Transcrição vazia retornada pelo modelo")
+            
+            # Formatar segmentos com timestamps
+            segments = []
+            for i, segment in enumerate(transcription.offsets, start=1):
+                start_time = self.format_timestamp(segment['timestamp'][0])
+                end_time = self.format_timestamp(segment['timestamp'][1])
+                text = segment['text'].strip()
+                
+                if text:  # Só adicionar se houver texto
+                    segment_str = f"{i}\n{start_time} --> {end_time}\n{text}\n\n"
+                    segments.append(segment_str)
+            
+            print(f"Segmentos formatados: {len(segments)}")
+            return segments
+                
+        except Exception as e:
+            print(f"Erro detalhado no processamento do áudio: {str(e)}")
+            raise Exception(f"Erro no processamento do áudio: {str(e)}")
+    
+    def format_timestamp(self, seconds):
+        """Converte segundos em formato de timestamp SRT (HH:MM:SS,mmm)"""
+        hours = int(seconds // 3600)
+        minutes = int((seconds % 3600) // 60)
+        seconds = seconds % 60
+        milliseconds = int((seconds % 1) * 1000)
+        seconds = int(seconds)
+        
+        return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"
+
+
+    def format_whisper_output(self, transcription):
+        """Formata a saída do Whisper em formato SRT"""
+        segments = []
+        pattern = r"\[(\d+:\d+\.\d+) --> (\d+:\d+\.\d+)\](.*?)(?=\[|$)"
+        
+        matches = re.finditer(pattern, transcription, re.DOTALL)
+        
+        for idx, match in enumerate(matches, 1):
+            start_time = match.group(1)
+            end_time = match.group(2)
+            text = match.group(3).strip()
+            
+            # Converter para formato SRT
+            start_time = self.convert_timestamp_to_srt(start_time)
+            end_time = self.convert_timestamp_to_srt(end_time)
+            
+            segment = f"{idx}\n{start_time} --> {end_time}\n{text}\n\n"
+            segments.append(segment)
+        
+        return segments
+
+    def convert_timestamp_to_srt(self, timestamp):
+        """Converte timestamp do Whisper para formato SRT"""
+        # Converter MM:SS.ms para HH:MM:SS,mmm
+        minutes, seconds = timestamp.split(":")
+        seconds, milliseconds = seconds.split(".")
+        
+        hours = int(minutes) // 60
+        minutes = int(minutes) % 60
+        
+        return f"{hours:02d}:{minutes:02d}:{seconds:02d},{milliseconds:03d}"
+
+    def process_video(self):
+        try:
+            # Extrair áudio
+            audio_path = "temp_audio.wav"
+            print("Iniciando extração de áudio...")
+            
+            if not self.extract_audio(self.video_path.get(), audio_path):
+                raise Exception("Falha na extração do áudio")
+            
+            print("Áudio extraído com sucesso")
+            
+            # Obter código do idioma
+            selected_name = self.selected_language.get()
+            language_code = self.languages.get(selected_name, 'en')
+            print(f"Idioma selecionado: {selected_name} ({language_code})")
+            
+            # Processar áudio com Whisper
+            print("Iniciando reconhecimento de fala...")
+            self.subtitles_list = self.process_audio_with_whisper(audio_path, language_code)
+            
+            if not self.subtitles_list:
+                raise Exception("Nenhum texto foi reconhecido")
+            
+            print(f"Texto reconhecido com sucesso: {len(self.subtitles_list)} segmentos")
+            
+            # Mostrar legendas na interface
+            self.root.after(0, self.update_subtitle_text, ''.join(self.subtitles_list))
+            
+        except Exception as e:
+            print(f"Erro no processamento: {str(e)}")
+            self.root.after(0, messagebox.showerror, "Erro", f"Erro ao gerar legendas: {str(e)}")
+        
+        finally:
+            # Limpar
+            self.root.after(0, self.progress.stop)
+            if self.video is not None:
+                self.video.release()
+            try:
+                if os.path.exists(audio_path):
+                    print(f"Removendo arquivo temporário: {audio_path}")
+                    os.remove(audio_path)
+            except Exception as e:
+                print(f"Erro ao remover arquivo temporário: {str(e)}")
+            
+    def update_subtitle_text(self, text):
+        self.subtitle_text.delete(1.0, tk.END)
+        self.subtitle_text.insert(tk.END, text)
+    
+    def save_subtitles(self):
+        try:
+            # Pegar texto atual
+            current_text = self.subtitle_text.get(1.0, tk.END).strip()
+            
+            # Validar formato básico das legendas
+            if not self.validate_subtitle_format(current_text):
+                raise ValueError("Formato de legendas inválido. Mantenha o formato: número + tempo + texto")
+            
+            # Salvar em arquivo
+            output_path = os.path.splitext(self.video_path.get())[0] + ".srt"
+            with open(output_path, 'w', encoding='utf-8') as f:
+                f.write(current_text)
+            
+            messagebox.showinfo("Sucesso", f"Legendas salvas com sucesso em:\n{output_path}")
+            
+        except Exception as e:
+            messagebox.showerror("Erro", f"Erro ao salvar legendas: {str(e)}")
+
+    def validate_subtitle_format(self, text):
+        """Validação melhorada do formato das legendas"""
+        if not text.strip():
+            return False
+            
+        lines = text.split('\n')
+        i = 0
+        
+        while i < len(lines):
+            if not lines[i].strip():
+                i += 1
+                continue
+            
+            # Validar número da legenda
+            if not lines[i].strip().isdigit():
+                return False
+            
+            # Validar formato do tempo
+            i += 1
+            if i >= len(lines):
+                return False
+            
+            time_line = lines[i].strip()
+            if not (' --> ' in time_line and 
+                   time_line.count(':') == 4 and 
+                   len(time_line.split(' --> ')) == 2):
+                return False
+            
+            # Validar texto da legenda
+            i += 1
+            if i >= len(lines) or not lines[i].strip():
+                return False
+            
+            i += 1
+            
+        return True
+
+if __name__ == "__main__":
+    root = tk.Tk()
+    app = VideoSubtitleApp(root)
+    root.mainloop()