Jump to content


Logo

  •  


Advertisement

Welcome to DarkSmurfSub.com


Sign In 

Create Account
Membership is closed for the time being, but we occasionally open it when there is increased interest from volunteer subbers to join, so check back when you are interested in subbing a new drama, if we are accepting new members to the community!
 
Guest Message by DevFuse

Photo

Hardsubs Extraction Guide


  • Please log in to reply
12 replies to this topic

#1 jamiepeach

jamiepeach

    Uploader Team Leader

  • Retired Staff
  • Others:
    CTS Elite
    CTS Translator (C)
    SCT Timer
    SCT Uploader

  • PipPipPipPipPip
  • 551 posts
    • Time Online: 118d 18h 40m 12s
  • Local Time: Apr 26 17 05:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 11 Apr 12 - 05:39 AM


    This is a guide on how to rip/extract hardsubs.

    Additional details are being added to the guide. (Last Update: 2012-04-28)
    If you have any suggestions, corrections or additions, feel free to make any changes, or contact me.
    This guide will focus on how to extract Chinese hardsubs as it pertains to DSS.
    The same process can be used on other language hardsubs as well.


    Requirements: Windows-based computer, AVISubDetector/esrXP, appropriate video codecs, time, patience
    Optional: AviSynth, OCR software

    If you would like to help out in this process, please contact zestybeta888.


    1st post: Introduction
    2nd post: Preparing the video file
    3rd post: AVISubDetector
    4th post: esrXP
    5th post: Converting BMP to GIF
    6th post: OCR Software


    A few explanations before we get into the details:


    hardsubs: Subtitles that are encoded directly into the video image. Hardsubs cannot be disabled or altered.
    softsubs: Subtitles that are stored separately from the video image. Softsubs can be enabled/disabled. Depending on the format of the softsubs, the way that they appear onscreen may be customizable.
    transcribe: The process of putting something into written form. In this context, transcription refers to typing out the text from an image.

    Softsubs are preferred for translating because they can be put through the DSS machine translation system, and because text is searchable.

    Software can assist in the process of extracting hardsubs from a video. The software will not be 100% accurate, and the subtitles will still need to be verified afterwards to ensure that no lines were missed, and that the timing is acceptable. The software will extract images and timing information. From there, the text in the extracted images are transcribed or put through OCR to produce softsubs.


    Steps in creating softsubs from a hardsubbed video


    • Obtain hardsubbed video - Given the choice, high resolution video will probably have clearer subtitles. Higher resolution also means more computation will be required.
    • Optional: Preprocess the video – Transcode the video into a compatible format. Filters can also be applied to the video to make subtitle detection better.
    • Process video using extraction software - AVISubDetector and esrXP will be discussed below. These applications are only compatible with Windows XP/Vista/7/2000. The applications will extract images and timing information from the video.
    • Optional: clean up extracted images - The software may detect text where there is none. There can also be repeated lines. Duplicate and empty images can be cleaned up prior to uploading the images onto DSS, or it can be cleaned up after the transcribe process.
    • Save SRT file and extracted images - Use an appropriate format for the images (i.e. GIF, BMP). Name the images according to the following format:


      TP.QS.Title.Year-E01.#####.bmp
      e.g. TP.QS.King.Gwanggaeto.2011-E78.orig.00000.bmp
      TP. = Transcribe Project, QS. = Korean Drama
      Note: the numbers and file extension will be generated automatically by the program

    • Optional: Convert BMP files to GIF files to reduce the image file size.
    • Compress the images into an archive - Only ZIP compression is currently supported. To ZIP files in Windows, using Windows Explorer, select all the image files, right-click on any file and choose Send to > Compressed (zipped) folder.

      Note: Name the ZIP archive with the project name. Compress the images in the root directory of the ZIP archive (i.e. don't compress the images in a folder. Compress just the images) [Not certain if this is necessary, but it is what works for me.]
    • Upload the SRT file and the ZIP archive - Please request for additional instructions. Projects should be named according to the following format:

      TP.QS.Title.Year-E01
      e.g. TP.QS.King.Gwanggaeto.2011-E78


      SRT file details
      • The SRT file should have the image filename as the line content (remove any path information by doing a Find & Replace). If you're using AviSubDetector, a line can be composed of more than one image.
      • Keep the file extension of the image as BMP even if you convert the image to GIF format.
      Example SRT file:

      1
      00:00:24,833 --> 00:00:26,969
      TP.QS.K-pop.The.Ultimate.Audition.2012-E06.00001.bmp
      
      2
      00:00:27,102 --> 00:00:30,506
      TP.QS.K-pop.The.Ultimate.Audition.2012-E06.00002.bmp
      
    • Verify the softsubs – After the transcribe process is complete, download the softsubs, and compare it with the hardsubbed video. Fix any typos, timing issues and/or missing lines.
    Aside: Optical character recognition (OCR) is currently not used because the accuracy of the output varies. There are many applications capable of OCR. For Simplified Chinese, I used the 尚书七号 application to process the images (thanks to sean666 at Area11 for creating a guide for doing this).

    Color Models


    Knowledge about RGB and HSL color models is useful in configuring filters. A filter is a device that separates things. In this context we want to separate the text from the background.
    Wikipedia: RGB (red, green, blue), HSL (hue, saturation, lightness)


    • Nee-chan, milkyway, tenjiku and 3 others like this
    Follow DSS on Facebook, Tumblr, Twitter, Google+, Pinterest.

    Awards Bar:

    Users Awards


    Advertisement

    #2 jamiepeach

    jamiepeach

      Uploader Team Leader

    • Retired Staff
    • Others:
      CTS Elite
      CTS Translator (C)
      SCT Timer
      SCT Uploader

    • PipPipPipPipPip
    • 551 posts
      • Time Online: 118d 18h 40m 12s
  • Local Time: Apr 26 17 05:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 11 Apr 12 - 05:52 AM

    Preparing the video file


    To ensure that the video can be read by the program, remove any Unicode characters (e.g. Chinese characters) in the filename. Renaming the video to follow the project name format might be a good idea.

    The hardsubbed video must be in a compatible format. AVISubDetector will only work with AVI files and AviSynth files. (I'm not certain about esrXP's limitations.) Also, ensure that your computer has the appropriate codecs installed.

    If the video is interlaced, applying some sort of de-interlacing on the video might be helpful. Interlaced video can be identified by the combing effect that is exhibited in scenes with a lot of motion.

    AviSynth (Download Page)


    AVISynth is a useful tool, that works as a frameserver. After installing AviSynth, you can create an AviSynth script that will allow you to open non-AVI videos in AVISubDetector. If you only want to process a portion of a video, you can also use the Trim() function to specify the frames you want. There are many other capabilities as well that are documented on its website.

    DirectShowSource("C:\Path\video.rmvb", fps=29.97, seek=true, audio=false, video=true) # load rmvb
    ConvertToRGB24() # convert to RGB24 color model
    Trim(62640,70000) # retain frames 62640-70000 inclusive
    
    You may need to customize the script to fit your needs. In order to to create a script, start a text document, write the script, and save the file with the .avs file extension.

    Transcoding Software


    If AviSynth does not work, you will need to transcode the video into a compatible format. There are many applications that can be used to transcode video.

    zestybeta888's recommendations

    Aiseesoft Total Video Converter: DOWNLOAD LINK
    This software is capable of converting videos such as .f4v/.flv files to AVI and other popular formats. At the download page, you will see a license key below, so it's basically free. But it can't convert .rmvb.

    So, again if you're looking for a free tool to convert RMVB to AVI, here's another link you might want to download: RMVB converter


    • Nee-chan and weijunn like this
    Follow DSS on Facebook, Tumblr, Twitter, Google+, Pinterest.

    Awards Bar:

    Users Awards

    #3 jamiepeach

    jamiepeach

      Uploader Team Leader

    • Retired Staff
    • Others:
      CTS Elite
      CTS Translator (C)
      SCT Timer
      SCT Uploader

    • PipPipPipPipPip
    • 551 posts
      • Time Online: 118d 18h 40m 12s
  • Local Time: Apr 26 17 05:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 11 Apr 12 - 05:59 AM

    AVISubDetector (Download Page)


    A guide to extract English hardsubs can be found here (not everything will be applicable for Chinese subtitles): (PDF)

    The instructions below have been provided courtesy of DarkSmurf:

    • Download and install AVISubDetector (Download Page)
    • Download the following file, and unzip it. Attached File  new-default-settings2.zip   382bytes   823 downloads

      The new-default-settings2.sdt file inside the archive is the saved ripping settings that have worked well so far. (They should be tested to see if they work well with your video. You may need to customize the settings to suit your needs.)
    • Open the program. Set the main project settings. These settings are retained between sessions.

      Posted Image

    • To start the ripping process and save the images,
    • Type in the file name of the project, using the format: TP.QS.Title.Year-E01
    • Load the .avi file
    • Click on the Settings tab. Click Load Settings and load the new-settings2.sdt file from step 2. Change Skip change if distance is less then 2 frames to 5.

      Posted Image

    • You can click on Start (Full) to begin. Leave all other settings as it is. Make sure you are in Automatic Mode (which is the default). It should start to detect and rip the settings. I would suggest clicking on the PreOCR tab, so that the video doesn't have to play.
    • Once the transcribe is completed, you will see a couple of folders in the project directory that you had setup.

      The two most important are:
      SubPic – Images .bmp is kept
      Text – subtitle .srt is created

    • In the SubPic directory, there will be a lot of images, but what you would need are the .bmp with .orig. in the filename.

      Example: TP.QS.King.Gwanggaeto.2011-E78.orig.00000.bmp


      The rest are duplicates. The .orig. file is the ripped portion for just the text. Select all the *.orig.*.bmp and then zip them up. You may also want to convert the BMP files into GIF format to reduce the file size (see this post). You can now proceed to upload a transcribe project.
    Program Overview
    There is no manual for this software. The creator of this program (Shalcker) describes the workings of the program in the following threads: Thread 1, Thread 2

    Project Tab – General project and program settings.

    Subtitles Tab – Displays the subtitles.


    The timing and stats for detected subtitles are display in the grid. The stats can also be viewed as a graph on the Stats tab.

    Colors Tab – Illustrates the color settings.


    Colors can be set to filter out the background from the text. The colors can be set on the Colors tab or the Settings tab (under Color Domination).

    Settings Tab – Where most of the magic occurs. (See below for details.)

    Stats Tab – Displays graphs of the stats calculated by the program.


    The graphs can be analyzed to improve settings. The x-axis corresponds to the frames in the video.

    PreOCR Tab – Displays the images that have been captured.


    This tab can be selected while the program is running to display the captured images.

    OCR Tab – Displays OCR options.


    I haven't used this aspect of the program, and it's not likely to be very effective with Chinese characters, so I will skip over this part.

    Settings Details


    The algorithm used to detect subtitles and line changes is slightly mathy. The assumption is that if there are subtitles in a given frame, there will be high contrast in the image. The author of this program refers to this criteria as the image being 'sharp'.

    How the algorithm works:

    The program analyzes each frame by computing a number of statistics. This statistics are used to determine if there are subtitles in the image, and whether or not there has been a line change.

    • Two rows of checkboxes appear at the top.
      • The Preview row is used to select which images appear in the preview area.
      • The Settings row is used to select which options appear on the Settings tab.
    • Sample – The 'Open Bitmap' button allows the user to select a bitmap (BMP image) for analysis.
    • The preview area displays the images that were selected in 1a.
    • Crop – Use the sliders to set the area that you want analyzed, or select one of the presets at the bottom.
    • Color Domination – Click on the Crop Settings image to select a color. Shift-click to add a new color. Click on the icon to toggle between T/O/X options.
    • Drop Values – Set the tolerance for minimum amount of contrast between pixels. If the contrast falls below the minimum, the difference is set to zero.
    • Blocks – Set the criteria for 'sharp' blocks.
    • Lines – Set the criteria for 'sharp' images.
    • Detection Settings – Most of the settings are summarized in this pane.

    Although the settings are inter-related, detection can be separated into two tasks:
    • Detecting the presence of subtitles.
      • Crop: [Full | 1/2 | 1/3 | 1/4 | 1/5] are presets for cropping. The Crop Top and Crop Bottom fields can be changed to customize the crop area.
      • Drop Values: The slider at the top changes the distance to which pixels are compared. Y Diff can be enabled if you want pixels to be compared in the vertical direction as well. The minimum difference can be set for each individual color channel, as well as an aggregate of the three channels.
      • Block Value: The value that the aggregate value of pixels in a block must exceed to be considered 'sharp'.
      • Block Size: The width of a block (in pixels)
      • Block Count: The number of sharp blocks in a line for a line to be considered 'sharp'.
      • Line Count: The number of lines in a frame for the frame to be considered 'sharp'.
      • CenterW: A multiplier for values corresponding to pixels in the center of the frame. (Subtitles are usually centered. This parameter can be used to increase the detection rates of short lines.)
    • Detecting line changes. (The current frame is compared with the previous frame.)
      The relevant settings are listed under the Tracking Changes heading.
      • DLC – Detected Line Count: Change in the number of sharp lines. Useful for detecting changes between 1-line subtitles and 2-line subtitles, or vice versa.
      • L/RMB – Leftmost/Rightmost Block: Change in position (in pixels) of the leftmost and rightmost blocks in each line.
      • MED – Average Blocks per Line:
      • MBC – maximum block count: I believe there's a bug in the program for this parameter.
      • LBC – lines with same block count: I only get zeros from this parameter, so I believe it wasn't fully implemented.
      • Skip change if distance is less than X frames – Changes are ignored if a change is detected before X subsequent frames have been analyzed. This parameter helps to reduce the number of falsely detected changes by ensuring that each subtitle lasts for at least X frames.


    More forthcoming


    • Nee-chan, weijunn and devinenova789 like this
    Follow DSS on Facebook, Tumblr, Twitter, Google+, Pinterest.

    Awards Bar:

    Users Awards

    #4 jamiepeach

    jamiepeach

      Uploader Team Leader

    • Retired Staff
    • Others:
      CTS Elite
      CTS Translator (C)
      SCT Timer
      SCT Uploader

    • PipPipPipPipPip
    • 551 posts
      • Time Online: 118d 18h 40m 12s
  • Local Time: Apr 26 17 05:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 14 Apr 12 - 05:57 PM

    esrXP (Download Page – warning: don't click on the misleading banner ads on this site)


    English and Chinese explanations of the software can be found at this site: (Link)

    You can download these settings for reference. You will most probably have to customize the settings to suit your needs. These are registry files; simply extract the archive and double-click on the file, confirm that you would like to add the registry settings, and the settings will show up in the program when you run it.

    Settings from Area11 – Attached File  AREA11esrXP-scaningSETUP.zip   2.9KB   330 downloads
    Settings used for K-pop Ultimate Audition – Attached File  esrXP_settings_KPUA.zip   1.55KB   239 downloads

    One characteristic of this software is that the original color images cannot be saved. Only the filtered black and white image can be saved. The black and white images are smaller than color images, and can be used with OCR, however it may be more difficult to read if manually transcribed.

    Posted Image
    Main window.



    Typical Workflow

    • Posted Image File > Open Video... – From the dialog window, select the video file or AviSynth file you wish to work on. The program may stall for a moment while the video is loading.
    • Posted Image Subtitle > Filter – A window opens up to with the video image. This window is used to set single–frame detection options.

      Use the seek bar at the bottom to find a frame with subtitles.

      If you don't see an image in the filter window, your computer probably doesn't have the proper codecs installed. In my case, I installed the K–Lite Mega Codec Pack to fix the issue, although one of the smaller bundles might have been sufficient. (You may already have a codec pack installed on your computer. It is not recommended that you install more than one codec pack, as it may cause conflicts.)

      Create a box to specify the area you want analyzed. A box that is tight around the subtitles is less likely to generate false positives.
      • Left–click on the image to set the top–left corner of the box.
      • Right–click on the image to set the bottom–right corner of the box.
      • The Region controls in the left–hand pane can also be used to adjust the size and location of the box.
      • Clicking the center of the Region controls toggles a horizontal guide that divides the selection area into two halves. If the subtitles appear on two lines, the horizontal guide can be used to center the selection area. If you intend on using OCR, it is a good idea to center the selection area between the two lines. Otherwise, it's not too important.

      Posted Image
      The high-lighted area denotes the selected area.


      Filter Setting Controls

      Filters can be designed specifically for a certain style of subtitle. This methodology will generate fewer false positives, but also means that different filters will need to be created to fit different subtitle styles. The filters can also be designed to be more general so that they work with many different styles, but more false positives will likely be generated.

      Enable Filter – display the filtered output. The desired output will have the text in white, and other areas in black.
      Additional Color – when Enable Filter is selected, the filter will be displayed with multiple colors to illustrate the 3 filtering stages.
      • Red for areas that pass through the Outline filter
      • Green for areas that are not red and are not classified as text
      • Black for areas that are not red and do not pass through the Pass1 filter
      • Blue for areas that are not red and do not pass through the Final filter
      • White for areas that are not red, and pass through the Pass1 and Final filters and postprocessing settings

      Select Color from the drop-down list if the subtitles does not have an outline.
      Select Color + Outline from the drop-down list if the subtitles have an outline.

      Subtitle Color – Opens a color selection window. Select the color of the subtitles.
      Outline Color – Opens a color selection window. Select the color of the subtitle outline.

      Posted Image
      Color selection window.


      Advance – Opens a dialog for configuring the 3 filtering stages. Have Enable Filter and Additional Color selected to visualize your settings. If you are trying new settings, it is easier to uncheck all the filtering options first, and add each filter one at a time.

      Posted Image


      Click on the checkbox to enable the desired filtering options.
      • Hue Difference and RGB Difference are dependent on the Subtitle Color or Outline Color that was selected. (The Outline filter uses Outline Color, while Pass1 and Final filters use Subtitle Color. If you don't use any of the Difference settings, it doesn't matter what color setting you choose.) All pixels within the difference threshold pass through the filter.
      • Enable Min to create a high-pass filter (everything below the minimum is filtered out). If the outline is white, you can set a high-pass filter to retain the bright colors and eliminate the dark colors.
      • Enable Max to create a low-pass filter (everything above the maximum is filtered out). If the outline is black, you can set a low-pass filter to eliminate the bright colors and retain the dark colors.
      • Enable both Min and Max to create a band-pass filter (everything below the minimum or above the maximum is filtered out).
      • Pixel Compensate sets the tolerance level on areas colored in black (from Pass1). Set at zero, if there is even one pixel that is classified as black in an area, that area will not be classified as text.
      Allowable values:
      Pixel Compensate [0 – 6]
      Hue Difference [0 – 181]
      RGB Difference [0 – 255]
      Lum, Sat [0 – 100] (Min must be less than Max)

      Postprocessing – Opens a dialog to select clean–up options, after the image is filtered. (see the guide linked above for more details)

      Posted Image


      Filtering Example:

      Posted Image
      Unfiltered selection.


      In this example, the outline and subtitle colors have been selected as black and white respectively.

      Posted Image
      Selecting Enable Filter and Additional Color without any filtering options enabled will result in the selected area to be filled with red.


      Stage 1: Outline – Filter out areas not associated with the outline. The outline will show up in red, other areas will show up in green or white. You don't want the red the bleed into the text. The aim is to setup a filter that only has the pixels of the outline passing through. If all the boxes are unchecked, all the pixels pass through the filter, meaning that all the pixels are a part of the outline. Sometimes the color in the background will be similar to the subtitle outline and thus will show up red as well (which is perfectly normal).

      Posted Image
      For the Outline filter, Lum Max and RGB Difference are enabled and set to 70 and 172 respectively.


      Stage 2: Pass1 – Filter out areas not associated with subtitles. Areas that are filtered out will show up in black. This filter does not have to be too accurate.

      After stage 2, all areas that are fully surrounded by red, absent of any black are filled in with white. The Pixel Compensate parameter can be increased to provide some tolerance for the black pixels. Any areas within the tolerance will be filled with white as well.

      Posted Image
      For the Pass1 filter, RGB Difference and Sat Max are enabled and set to 81 and 10 respectively. Pixel Compensate is set to 1.


      Stage 3: Final – Filter out any areas that are colored in white. Areas that are filtered out will show up in blue. This filter can be used for fine–tuning which areas are detected.

      Posted Image
      For the Final filter, Sat Max is enabled and set to 6.


      Posted Image
      A black and white filtered image can be seen with Additional Color disabled.


      Posted Image
      Filtered image after postprocessing options are enabled (remove block touch edge, remove single pixel dot, remove block larger than 15x15 pixels).


      Posted Image
      Unfiltered selection.


      Comment
      Filtering settings may work well in one frame, but they might not work well with other frames. It's a good idea to verify that the filter works well at many points in the video and to leave some tolerance when filtering.
    • Posted Image Subtitle > Rip Option – A small dialog box presents options for frame–to–frame detection.

      Skipping more frames will mean faster processing, however the timing will be less accurate.
      Pixel Difference and Ignore Change (%) settings modify the sensitivity of when a new line is detected.

      Posted Image

    • Start – Click the start button at the bottom, and then sit back and wait for the program to finish executing. The time required to finish processing the video will depend on the options that were selected, the video, and your computer's processing power. It usually takes me about 30 minutes to 1 hour to process a 350MB rmvb file (but this is on a 5-year old laptop, running Windows as a virtual machine, so it could be much faster). You may click stop at any time, but the program cannot resume, and will need to start from the beginning again. The captured images will start popping up on the right hand side. If the images are not good, stop the program and modify the settings.

      Posted Image


      The program will apply the filter and postprocessing settings to all the frames that it analyzes. If text is detected in a frame (areas classified as white), the frame will be compared to adjacent frames to see if it's a new line, or the continuation of a line. If many frames without text are detected as having text, you might want to adjust the filter settings. If the program is not detecting line changes properly (many duplicate lines, or missing lines) you may wish to adjust the rip options.
    • Posted Image Subtitle > Manager – The manager displays all the extracted images. You can go through the images and delete any unnecessary ones.

      • Left–click an image to highlight it for deletion. You can also click–and–drag to highlight multiple images. Left–click on the image again to deselect it.
      • To delete a series of images, right–click on the first one, and left–click on the last one.
      • To merge a duplicate line, right–click on the first one, and right–click on the last one. You will see arrows on the right side of the first and last images to indicate the range. The last line will be highlighted in a different shade. This indicates which image will be retained when the duplicates are deleted. Left–click any image within the range to select a different image to be retained. Right–click on any image in the range to cancel the merge. By default, the manager will only allow you to merge lines if the timing is continuous. To override this, you can check the "Force Merge" box in the toolbar.
      • After selecting the images for deletion, click on the red X in the toolbar to commit the deletion.
      • Use Recover Filtered on areas where text was erroneously filtered out.
      • Use Remove Passed to remove falsely detected areas.

      Posted Image
      Manger Window


      Posted Image
      Images highlighted for deletion.


      Posted Image
      Merging a range of images: note the blue arrows on the right. The lighter-colored image will be retained.

    • Posted Image File > Save As...

      From Save as type, you can select the format you wish to save in.
      esrXP (*.esr) – save the state of the current session
      SubRip with bitmap (*.srt) – save images and .srt file

      Posted Image

    Saving Your Settings and Batch Processing

    On the right end of the toolbar in the main window, there is a combo box, save button and delete button Posted Image. By writing a description in the combo box, and clicking save, your current settings will be saved. Use the combo box to switch between your saved settings.

    After saving your settings, you can setup batch processing (under the File menu). This is helpful as you will typically need to process a video once for the dialog, a second time for explanations, and a third time for lyrics.

    Posted Image

    Select the saved setting you wish to use from the drop-down list, and click add video, to add jobs to the list.

    Exporting Images for OCR
    If you wish to use OCR, export the images using Save OCR Image.

    Posted Image
    If subtitles are captured on two lines, select the checkbox.


    See this post for further details.

    Creating an SRT File with text instead of image filenames
    On the left side of the main window, you can type out the subtitles with each line corresponding to one image. If you used OCR, you can paste the OCR text into this area as well. Choose SRT from the Save As... dialog box to create an SRT file with text instead of the image filenames.


    • Nee-chan and weijunn like this
    Follow DSS on Facebook, Tumblr, Twitter, Google+, Pinterest.

    Awards Bar:

    Users Awards

    #5 jamiepeach

    jamiepeach

      Uploader Team Leader

    • Retired Staff
    • Others:
      CTS Elite
      CTS Translator (C)
      SCT Timer
      SCT Uploader

    • PipPipPipPipPip
    • 551 posts
      • Time Online: 118d 18h 40m 12s
  • Local Time: Apr 26 17 05:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 14 Apr 12 - 05:57 PM

    Converting BMP to GIF


    The file size of the output images can be quite large if they are in BMP format. You can use a batch conversion tool to convert the images into GIF format.

    If you're using Adobe Photoshop, you can record an action and use batch processing to accomplish this task. Instructions can be found here: external link

    Gimp is an open source and free alternative to Adobe Photoshop, and can achieve the same using scripts. There are also many programs dedicated to image file format conversion.

    zestybeta888's Recommendation:
    You might want to check out this batch conversion that I'm currently using. It's easy to use and FREE!: LINK


    • Nee-chan and weijunn like this
    Follow DSS on Facebook, Tumblr, Twitter, Google+, Pinterest.

    Awards Bar:

    Users Awards

    #6 jamiepeach

    jamiepeach

      Uploader Team Leader

    • Retired Staff
    • Others:
      CTS Elite
      CTS Translator (C)
      SCT Timer
      SCT Uploader

    • PipPipPipPipPip
    • 551 posts
      • Time Online: 118d 18h 40m 12s
  • Local Time: Apr 26 17 05:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 14 Apr 12 - 05:57 PM

    OCR Software



    You can find the original guide in Chinese here: external link

    尚书七号 is suitable for recognizing simplified Chinese. Note that this application is in Chinese. Depending on your system configuration, Chinese might not be displayed properly in the program, so you may have to follow these  instructions.

    I have only tested this software using images generated by esrXP, where the text is separated from the background.

    Posted Image
    Main Window


    Select all the images you want analyzed in Windows Explorer. Click the first image, and drag them into the list on the left. If you don't drag the first image, the images will not show up in the right order.

    Posted Image
    Select all images to be processed.


    Once the images are loaded, hold shift and click to select all the images in the list. Click on the icon with the reading glasses.

    Posted Image
    OCR text is displayed on the right.


    Once OCR has completed, select all the images on the left once again. In the menu bar, select 输出(P) > 到指定格式文件(F) to export the lines to a file.

    The file will be encoded for Simplified Chinese so you will have to open that file with the encoding to view the text properly.

    The OCR output will need to be verified and likely needs to be cleaned up. The guide at Area11 suggests using Find and Replace in a text processing application to quickly remove extra spaces, line breaks, and symbols.


    • Nee-chan, weijunn and flashxml like this
    Follow DSS on Facebook, Tumblr, Twitter, Google+, Pinterest.

    Awards Bar:

    Users Awards

    #7  DarkSmurf

    DarkSmurf

      Administrator

    •  Founder

    • PipPipPipPipPipPip
    • 1,102 posts
      • Time Online: 593d 6h 35m
  • Local Time: Apr 27 17 01:34 AM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 25 Apr 12 - 04:56 AM

    Woah... Nice work on this hardsubbed ripping guide :)


    Awards Bar:

    Users Awards

    #8 JCW1960

    JCW1960

      Village Visitor

    • Members
    • 1 posts
      • Time Online: 16h 2m 3s
  • Local Time: Apr 27 17 01:34 AM
    • LocationMalaysia
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 06 Jun 12 - 03:36 AM

    excellent work.....best guide on extracting hardsub so far

    #9 vern

    vern

      Village Visitor

    • Members
    • 17 posts
      • Time Online: 2d 1h 28m 33s
  • Local Time: Apr 26 17 12:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 07 Jun 12 - 05:58 AM

    Wow...really impressive! The time you must have spent putting this together is commendable. I wish I could find a guide this comprehensive for mac.

    #10 jamiepeach

    jamiepeach

      Uploader Team Leader

    • Retired Staff
    • Others:
      CTS Elite
      CTS Translator (C)
      SCT Timer
      SCT Uploader

    • PipPipPipPipPip
    • 551 posts
      • Time Online: 118d 18h 40m 12s
  • Local Time: Apr 26 17 05:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 11 Jun 12 - 12:25 AM

    Wow...really impressive! The time you must have spent putting this together is commendable. I wish I could find a guide this comprehensive for mac.


    I use a Mac as well. Unfortunately, I haven't come across any software that will run on Mac OS X. If anyone knows of any, I would like to hear about it too.
    Follow DSS on Facebook, Tumblr, Twitter, Google+, Pinterest.

    Awards Bar:

    Users Awards

    #11 giddybird

    giddybird

      Village Visitor

    • Members
    • 16 posts
      • Time Online: 1d 2h 21m 5s
  • Local Time: Apr 27 17 01:34 AM
    • Star sign:Unknown

    Posted 13 Sep 12 - 07:43 AM

    Can someone tell me whether AviSubDetector program is able to detect chinese hardsubbed in rmvb format?

    #12 jamiepeach

    jamiepeach

      Uploader Team Leader

    • Retired Staff
    • Others:
      CTS Elite
      CTS Translator (C)
      SCT Timer
      SCT Uploader

    • PipPipPipPipPip
    • 551 posts
      • Time Online: 118d 18h 40m 12s
  • Local Time: Apr 26 17 05:34 PM
    • Referred By:Undisclosed
    • Star sign:Unknown

    Posted 13 Sep 12 - 04:42 PM

    Can someone tell me whether AviSubDetector program is able to detect chinese hardsubbed in rmvb format?


    ASD can only read the AVI file format (or AviSynth scripts). As a work around, you can try using AviSynth and a script similar to the one described above, or transcode the RMVB file to AVI format.
    Follow DSS on Facebook, Tumblr, Twitter, Google+, Pinterest.

    Awards Bar:

    Users Awards

    #13 flashxml

    flashxml

      Village Visitor

    • Members
    • 1 posts
      • Time Online: 2h 15m 59s
  • Local Time: Apr 26 17 06:34 PM
    • Referred By:Google Search
    • Star sign:Unknown

    Posted 18 Sep 15 - 01:30 PM

    OCR Software



    You can find the original guide in Chinese here: external link

    尚书七号 is suitable for recognizing simplified Chinese. Note that this application is in Chinese. Depending on your system configuration, Chinese might not be displayed properly in the program, so you may have to follow these  instructions.

    I have only tested this software using images generated by esrXP, where the text is separated from the background.
     

    image-90AE_4F8A379F.gif
    Main Window


    Select all the images you want analyzed in Windows Explorer. Click the first image, and drag them into the list on the left. If you don't drag the first image, the images will not show up in the right order.

    image-D517_4F8A379F.gif
    Select all images to be processed.


    Once the images are loaded, hold shift and click to select all the images in the list. Click on the icon with the reading glasses.

    image-BDC2_4F8A379F.gif
    OCR text is displayed on the right.


    Once OCR has completed, select all the images on the left once again. In the menu bar, select 输出(P) > 到指定格式文件(F) to export the lines to a file.

    The file will be encoded for Simplified Chinese so you will have to open that file with the encoding to view the text properly.

    The OCR output will need to be verified and likely needs to be cleaned up. The guide at Area11 suggests using Find and Replace in a text processing application to quickly remove extra spaces, line breaks, and symbols.

     

     

    Link dead , I don't find this software ....

    Please help my .






    0 user(s) are reading this topic

    0 members, 0 guests, 0 anonymous users

    Change Theme!